


PBRIODICAL ROOM 
GENERAL LIBRARY 
UNIV. OF MICH. 


VOL. XXVI DECEMBER, 1935 =. NO. 7 vf 


The Journal of Educational 
Psychology 


Devoted Primarily to the Scientifie Study of Problems of Learning and Teaching 





CONTENTS 


Title-page and Index for Volume XXVI (1935) 


Technical Aspects of Multi-trait Tests. A Reply to Dr. Lorge. . . 641 
JOHN C. FLANAGAN 


Personality Traits by Fiat II: A Correction 
IRVING LORGE 


The Nature versus Nurture Problem. PartII ......... 655 
FRANK K. SHUTTLEWORTH 


The Relationship of Number of Hours of Study to Scholarship. . . 682 
E. G. WILLIAMSON 


What Should Be Included in Educational Psychology? ..... . 689 
NOEL B. CUFF 


An Experiment on the Law of Effect in Learning the Maze by 
ses NR eg ge kg Sk me et ak ee 695 


HOMER B. REED 


The Reliability of the Goodenough Draw a Man Test and the Valid- 
ity and Reliability of an Abbreviated Scoring Method. .... . 701 


MOSHE BRILL 


Twin Differences tn Intelligence. . . . . . 1... 2. 2 se eg 709 
D. CECIL RIFE 


$6.00 per Year - Published Monthly September to May 


WARWICK & YORK, INC. 


BALTIMORE, MD. 


Entered as Second Class Matter Nov. 15, 1921. at the Post Office at Baltimore, Md. 
under the Act of March 3, 1879; additional entry as Second Class Matter at York, Pa. 








re oe 
a Le 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOG) 


Established 1910 


EDITORS 
Jacx W. Duntap Harotp E. Jones Percivat M. Sywoy; 
Fordham University University of California Teachers College, 








Columbia University 


H. E. Bucnuouiz, Managing Editor 





& yews price of the Journal is $6.00 a year in the United States; $6.40 in fore; 
countries. Part-year subscriptions are 90 cents for each number ordered. Ba 
volumes are $7.00 each; back issues are $1.10 each except when more than five yex 
old, and then $1.20 each. 
Subscribers should notify the publishers of change in address at least four week 
before publication of the issue with which the change is to take effect. Claims { 
non-receipt of an issue will not be honored unless made within two weeks afte 
receipt of the next succeeding number. 
nsolicited manuscripts should be accompanied with return postage. Man 
scripts, books and other materials for review, and correspondence regarding editor 
and business matters should be addressed to the Publishers. 


WARWICK AND YORK - Publishers - BALTIMORE, MD 





















THE ACTIVITY MOVEMENT 


By Crype Hissonc 


In an attempt to overcome the weaknesses of the traditional school 
organization many progressive schools have developed new programs. 
These programs are so similar in character that collectively the 
changes have been referred to as the activity movement. This 
movement has claimed the center of the educational stage for a length 
of time sufficient to have engendered widespread interest in its out- 
comes and in its basic philosophy. 

In Doctor Hissong’s study an attempt has been made to discover 
the principles underlying the present activity movement, to determine 
the influence of traditional concepts in shaping the trends of the 
movement, and to see if in the light of the present knowledge of the 
child and his relation to his environment the movement rests upon a 


justifiable basis. 
$2.00 plus 10¢ postage. 


WARWICK AND YORK 


BALTIMORE 





a” —a tana aia F*. 








IG 


SYMON 


llege, 
versity 


1 foreis 
ed, Bac 
Ve yea 


Ur Week 
aims fe 
ks afte 


, Man 


editori 


MD 





THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXVI December, 1935 Number 9 








TECHNICAL ASPECTS OF MULTI-TRAIT TESTS. A 
REPLY TO DR. LORGE 


JOHN C. FLANAGAN 
Cooperative Test Service of American Council on Education 


Due to the important place which most students of individual 
differences have given to emotional traits such as interests and atti- 
tudes, numerous scales, questionnaires, inventories, etc. have been 
published. In an effort to secure these data more efficiently a few 
multi-trait instruments have been developed which attempt to assess 
a number of traits from a common set of responses. This procedure 
has a definite effect on the relations between the various trait-scores 
involved which must be considered in any analysis of the results. 
Brief discussions of techniques which may be used to eliminate the 
spurious relationship thus introduced have appeared previously.'? 
The issue has been raised again, however, by certain statements 
included in a discussion of an analysis of the Bernreuter Personality 
Inventory by Lorge, et al. Some very interesting data concerning 
the relations existing between the scoring keys and the consistency 
of response are presented, but the present writer is forced to disagree 
entirely with such statements as: 


The community of traits is in the keys, and hence, may not be in the 
subjects. 





1 Kelley, T. L., and Krey, A. C.: Tests and Measurements in the Social Sciences. 
Scribners, New York, 1934. 

2 Flanagan, J. C.: Factor Analysis in the Study of Personality. Stanford Uni- 
versity Press, Stanford University, California, 1935, p. 40ff. 

* Lorge, Irving, et al.: ‘‘Personality traits by fiat. II. The consistency of the 
Bernreuter Personality Inventory by the Bernreuter and by the Flanagan keys.” 
Journal of Educational Psychology, Vol. XXVI, 1935, pp. 427-434. 

See also, Lorge, Irving: “Personality traits by fiat. I. The Analysis of the 
total trait scores and keys of the Bernreuter Personality Inventory.” Journal 
of Educational Psychology, Vol. XXVI, 1935, pp. 273-278. 

641 





642 The Journal of Educational Psychology 


The intercorrelation of total trait I and total trait II is +.10. Since the 
keys correlated +.00 (on the average), it indicates that summing “Yes,” 
“No,” and “?” weights within a trait spuriously raises the correlation by the 
mixture of heterogeneous data. 


The Flanagan keys are statistically independent, but the scores derived 
therefrom lack consistency.' 


There are two fundamental issues involved in this discussion—the 
effect of correlation between the scoring-keys on the scores obtained 
from a multi-trait instrument and the concept of reliability with 
special reference to the proper technique to be employed in obtaining 
a reliability coéfficient representative of the consistency present in 
responses to the questionnaire type of item. 


CORRELATION BETWEEN SCORING-KEYS 


As a necessary basis to a consideration of the effect of correlation 
between scoring-keys on the individual scores, a brief review of the 
derivation and function of scoring weights will be presented. In 
general, a scoring-weight for a particular item-response is a single 
figure indicating the product of the individual’s score on the item and 
the appropriate regression-coefficient for predicting the dependent 
variable from the item-responses. Therefore, when traits are not 
independent, the items predicting one trait would tend also to predict 
standing in the other variable, at least in so far as they include elements 
common to the two variables. This lack of independence would be 
demonstrated if in a representative sample of individuals competent 
judges? find a correlation to exist between two traits. The tendency 
for many item-responses to be associated with both traits would, of 
course, tend to introduce a certain amount of correlation between 
the scoring-weights. Only when the traits were known to be inde- 
pendent in the population used to determine the scoring-weights, 
would it be expected that the scoring-keys for a given pair of traits 
would be uncorrelated. 

To make this point clearer a concrete situation will be described. 
Suppose it is desired to measure the two traits ‘‘doctor-interests” and 





1 Ibid. 

2 The consensus of a large number of competent judges is cited here as repre- 
senting a valid means of determining individual differences. The above argument 
. however is in no way affected by the method used to determine individual status. 





1e 


1€ 


KF we oe Oo 


ee i, ee 





Technical Aspects of Multi-trait Tests 643 


‘‘lawyer-interests.”’ If the doctors and lawyers whose responses are 
used to determine the scoring-weights tend to have interests in common 
which distinguish them from people in general, the scoring-keys would 
be found to be positively correlated. Although it should be clearly 
understood that the present writer has a very strong preference 
for independent or uncorrelated traits and would prefer to deal 
with such traits whenever possible, he certainly cannot agree to the 
latter part of the following statement. 


A test before its validity is to be demonstrated must show, at least, relia- 
bility of a specific magnitude consonant with the task the test is to do; con- 
sistency within its modes of response; and, in the case of a multi-trait instru- 
ment, independence within the keys.' 


As mentioned earlier in this discussion, if multi-trait instruments 
are used, the correlations obtained between the trait-scores do not 
give the correct value for the correlation between the traits involved— 
except in the theoretical case in which the trait-scores are perfectly 
reliable. It seems undesirable in a short article such as this to repeat 
or elaborate on the discussion of the techniques which may be employed 
to obtain correlation coefficients which are unaffected by this condi- 
tion. The discussions of Kelley and Flanagan cited earlier give 
examples and a very elementary one will be added here. 

Suppose that a multi-trait test were marked by a large number 
of subjects who did not read the items but merely marked one of the 
three alternatives to each item in a random fashion. Then a person 
who happened to mark items which led to a fairly large positive score 
in Trait A would also tend to get a fairly large positive score in Trait B, 
if the scoring-weights for these two traits were positively correlated. 
Thus the scores would show a certain intercorrelation pattern depend- 
ent on the correlation between the scoring-keys which would obviously 
be entirely spurious as an indication of the relations between the traits 
in the subjects tested, since the reliability coefficients would be pre- 
cisely zero. 

On the other hand, if the test were administered to another group 
under excellent conditions and the test was found to give perfectly 
reliable scores, the correlations between the trait-scores in this group 
would indicate accurately the relations between the traits measured 
by the test, regardless of the nature of the correlations between the 
keys. In practice, the conditions are always somewhere between 





1 Lorge, I.: Op. cit. 











644 The Journal of Educational Psychology 


these extremes, and in order to get the relations between the traits 
involved, unaffected by the spurious factors mentioned, two independ- 
ent scores for each of the traits, obtained from similar forms of a test, 
or, on similar halves of the test, are necessary. 

An investigation of the degree of freedom existing for the correla- 
tion coefficient between two scales of a representative multi-trait 
test for which the correlation coefficient between the scoring-weights 
in the keys has a given value has been made. Inspection of the 
F1-C and F2-S scales for the Bernreuter Personality Inventory indicates 
that although the correlation between the respective scoring-weights 
for the two scales is practically zero, the blank could be marked by a 
group of individuals so that their distributions of scores conformed 
to those given in the norms and yet have the correlation coefficient 
between the scores on the two scales assume any value from —1.00 
to +1.00.! 

The above conclusions follow directly from an inspection of the 
possible combinations of scoring-weights. For instance, it is possible 
for an individual to obtain a score above the ninety-ninth percentile 
or below the first percentile on both the F1-C and F2-S scales; or, 
it is possible to get a score below the first percentile on one scale and 
above the ninety-ninth on the other. Of course an enormous number 
of intermediate combinations of scores are possible. The maximum 
and minimum sums for a pair of scores on the F1-C and F2-S scales 
are +496 and —436 respectively and the maximum differences are 
six hundred thirty-two and four hundred twenty, depending on 
whether F1-C or F2-S is taken as positive. 

Although this indicated freedom of relationship for scales whose 
scoring-weights show zero correlation is significant, it is very important 
to know to what extent keys, having a very high intercorrelation, force 
their relationship into the subject’s scores. For this purpose the 
scoring-weights of the B1-N and B3-I scales of the Bernreuter Per- 
sonality Inventory were studied. The respective scoring-weights 
on these scales are known to be highly correlated and the correlation 
coefficients usually reported between pairs of individual’s scores are 
above .95. Although, as would be expected, there is not as much 
freedom possible in the relation between these scales, it is nevertheless 





1 Bernreuter, R. G.: The Personality Inventory. Stanford University Press, 
Stanford University, California, 1931. The revised manual published in 1935 
describes six scales, symbols for which are used throughout this article: Fl-C, confi- 
dence in oneself; F2-S, sociability; B1-N, neurotic tendency; B 2-S, self-sufficiency; 
B3-I, introversion-extroversion; B4-D, dominance-submission. 





— ts YY Dod @w 


\ 4 


\ 





Technical Aspects of Multi-trait Tests 645 


possible for a person scoring at the median on the B3-I scale to score 
above the ninety-seventh percentile or below the first percentile on 
the B1-N scale; similarly, a person scoring at the median on the B1-N 
scale could obtain a score above the ninety-ninth percentile and 
below the first percentile on the B3-I scale. It is possible to obtain a ° 
B1-N score one hundred sixty-six points higher than the corresponding 
B3-I score, and conversely it is possible to get a B3-I score one hundred 
ninety-nine points higher than the corresponding B1-N score. It 
then follows that rather than being forced into a narrow range of 
degrees of relationship by the high correlation existing between the 
corresponding scoring-weights, the subjects are free to show any 
degree of relationship between the scores on the two scales from a 
perfect positive correlation to a very substantial negative coefficient. 

The above discussion is, of course, theoretical. The relationship 
existing in the scoring-keys is there because the traits were related in 
the group from which the weights were derived and it would be very 
surprising if another population showed very dissimilar relationships. 
Differences in the relations between the traits in various groups do 


- exist, however, as, for example, in the case of high-school and college 


students on the Personality Inventory Scales, and, if the data are 
correctly handled these differences are not obscured by the relation- 
ships existing between the scoring-keys. 


CONSISTENCY: CORRELATION BETWEEN ‘“‘ AVERAGE 
TRAIT-RESPONSE SCORES’”’ 


In the second of the two articles previously referred to Lorge states: 


The Bernreuter Personality Inventory has been shown to be lacking in 
reliability (for individual diagnosis and prognosis), lacking in trait con- 
sistency, certainly lacking in independence.’ 


With respect to consistency or reliability a reference to Thorndike’s 
recent discussion, ‘‘Unity or Purity in Traits and Tests,’’* seems 
pertinent. In this brief but very excellent exposition of the general 
problem of unitary traits, summary scores, etc. Thorndike says: 


If these (behavior items of particular abilities, habits, wants, attitudes, 
interests) do not correlate perfectly they should not in strictness be attributed 
to a single unitary trait. 





1 Flanagan, J. C.: Op. cit. 

2 Lorge, L., et al.: Op. cit. 

* Thorndike, E. L.: ‘‘Unity or Purity in Traits and Tests.”’ Occupations, 
Vol. XII, 1934, pp. 27-29. 











646 The Journal of Educational Psychology 


In practice, however, if the intercorrelations among a set of particular 
abilities, attitudes, etc., are very high so that the community of causation 
very greatly outweighs the diversities, the latter may be neglected. 


Such average or summary scores are very valuable; they may be of notable 
service in distinguishing individuals and in predicting important facts. 


The discussion continues with a statement of the limitations of 
summary scores and the necessity for some assurance that the average 
of all the item scores has a definite meaning. The author especially 
warns against treating any such average as though it were a measure 
of a unitary or pure trait. 

It should be clear that if we have a summary score which happens 
to be a mixture of just two unitary or pure traits combined in equal 
proportions, it would be theoretically possible to divide the test into 
halves such that although each half might possess high or even perfect 
reliability within itself, its correlation with the other half would be 
zero. By changing the proportions of items of the two varieties in 
the halves of the test it would be possible to get halves such that the 

correlation between them would have any value whatsoever between 
' gero and unity. Clearly little indication of the reliability of a test 
ean be obtained unless similar parts or forms are compared. Not 
only must the parts be similar in form or content but also similar in 
length. If average scores from the various modes of response are 
“used to compare individual trait-scores, these are not “parts” in 
the ordinary sense of groups of items, since an item belongs with one 
group or another in an individual’s score, depending on how this 
individual happened to mark it. Since the number of items included 
in the ‘‘average trait-response score”’ for a particular mode of answer- 
ing would vary, the reliability of this average as an indication of an 
individual’s “‘true”’ trait-score would vary from individual to indi- 
vidual. Average scores for a particular mode of response seem to be 
vague in character and certainly of very doubtful value for any type 
of analytical work. 

However, motivated by the very low correlation between “‘ average 
trait-response scores” for “yes” and “no” on Scale F1-C of the 
Bernreuter Personality Inventory reported by Lorge,? the present 
writer calculated “‘average trait-response scores’? on the variables 
for a group of patients in a tuberculosis sanatorium. The blanks 
1 Tbid., p. 57. 
2 Lorge, I., et al.: Op. cit. 





Technical Aspects of Multi-trait Tests 647 


had been administered by him as part of a battery of tests to be used 
in connection with a guidance and rehabilitation program. The 
comparison of the results obtained from this group with those obtained 
by Lorge was quite surprising as shown in Table I. Whereas Lorge 
found a correlation between average F1-C trait-score for ‘‘ yes” 
responses and average F1-C trait-score for ‘‘no” responses of .23 
(SE = .09), the present writer found a correlation of .88 (SE = .04). 
This great discrepancy led the writer to obtain a third sample to 
compare with the previous ones. For this purpose he obtained the 
blanks of a group of one hundred twenty-one high-school students 
who had taken the inventories as a part of an intensive study of the 
structure of personality.! The intercorrelations found between the 
TaBLE I.—TueE INTERCORRELATIONS OF ‘‘ AVERAGE TRAIT-RESPONSE SCORES”’ ON 


“Yes,” ‘““No” anp ‘‘?” Responses FoR Scates F1-C anp F2-S or THE 
BERNREUTER PERSONALITY INVENTORY 





Lorge Flanagan Lorge Flanagan 





Variables F1-C, | F1-C, | F1-C, | F28, | F2-8, | F2-5S, 


N = 103} N = 28 |N = 121/N = 103) N = 28|N = 121 





Average ‘‘yes” with 





average ‘‘no”’....... .23 .88 85 .79 .79 .73 
Average ‘‘yes” with 

average “‘?”........ .10 .00 .14 .16 .20 ll 
Average “no” with 

ia — .02 — .02 .12 14 .24 .07 




















“average trait-response scores’”’ of this group, as shown in Table I, 
are in substantial agreement with those found in the group of sana- 
torium patients. 

For purposes of comparison certain of the constants of the dis- 
tributions of scores are given in Table II. 

As previously mentioned the significance of these correlation 
coefficients is at least vague if not definitely misleading. Due to the 
interdependence of the average scores, some of the coefficients are too 
large fairly to represent the amount of consistency present in the 
items. At the same time the small number of ‘‘?”’ responses tends 
to make the degree of consistency with these responses appear smaller 
than it is. To obtain more adequate information as to the value of 


the ‘‘?” responses as components of the total scores, the writer selected | / 





1The writer wishes to acknowledge the courtesy of Mr. P. S. de Q. Cabot 
in making this material available to him. 


| 














648 


The Journal of Educational Psychology 


TaBLE II.—ConsTANTS FOR THE DISTRIBUTIONS OF “‘ AVERAGE TRAIT-RESPONSE 
Scorges’’ MapE or ONE HuNpDRED TWENTY-ONE HiGH Scuoout Boys on 
ScaLes F1-C anv F2-S or THE BERNREUTER PERSONALITY INVENTORY 








*Ves’’ ‘*No”’ 66 p99 
Average number of responses..................... 52.8 60.6 11.6 
Mean of “‘average trait-response scores,” F1-C...... 1.12 | —1.52 0.42 
Standard deviation of ‘average trait-response 
ee Ce Seow ase h he cies se aes 0.73 0.49 1.19 
Mean of ‘‘average trait-response scores,” F2-S...... —0.13 | —0.31 | —2.41 
Standard deviation of ‘‘average trait-response 
a a al kk 0.55 0.43 1.52 














from the group of one hundred twenty-one high school boys men- 
tioned above, those blanks, twenty-five in number, on which twenty 
or more ‘‘?”’ responses had been marked. ‘The correlation coefficients 


for the ‘‘average trait-response scores” involving ‘‘?’’ responses are 
shown in Table III. 


TasBiLE III.—Tue CorrRELATION COEFFICIENTS OF ‘‘AVERAGE TRAIT-RESPONSE 
Scorses”’ ror ‘‘?’’ RESPONSES WITH THOSE FOR “‘YES’”’ AND ‘‘No”’ 
RESPONSES ON THE F1-C anp F2-S Sca.Les oF THE BERNREUTER 
PERSONALITY INVENTORY, AS OBTAINED FROM A SELECTED GROUP 

J 
oF Boys MARKING TWENTY OR More ‘‘?’’ RESPONSES 











N = 25 
“Tea” “No” 66 909 

Average number of responses...................-. 43.9 48.3 32.8 
Mean of “‘average trait-response scores,” F1-C...... 1.04 | —1.37 0.30 
SD “average trait-response scores,” F1-C........... 0.77 0.63 0.52 
Correlation between ‘‘average trait-response scores,”’ 

TET tcnnvaskunke séaeareedsaeven sb ees ewe 0.30 0.27 
Mean of ‘‘average trait-response scores,’ F2-S...... —0.18 | —0.40 | —2.77 
SD “‘average trait-response scores,” F2-S......... 0.59 0.56 0.46 
Correlation between “‘average trait-response scores,” 

ita ieee ek A a in at doo oie 0.37 0.21 














Although the coefficients shown in Table III are substantially 
larger than the corresponding values in Table II, it is apparent that 
the question mark responses are not as consistent, even taking into 
account the number involved, as are the other two responses. This is 
not surprising since the questions are designed so that the “‘yes”’ 
and ‘“‘no” responses usually represent the extreme values and the 
‘¢?” response is intermediary. 

In view of the data presented in Tables II and III it can hardly 
be said that an analysis of “‘average trait-response scores’”’ reveals 
a lack of consistency in the ‘‘F”’ scales of the Bernreuter Personality 
Inventory. 





‘SE 


Technical Aspects of Multi-irait Tests 649 


CONSISTENCY: RELIABILITY COEFFICIENTS 


A table of the reliability coefficients found by various investigators 
for the six published scales of the Bernreuter Personality Inventory 
is given below. 


TaBLE IV.—RELIABILITY COEFFICIENTS REPORTED FOR THE Srx SCALES OF THE 
BERNREUTER PERSONALITY INVENTORY 








Investigator Subjects B1-N B2-S B3-I B4-D F1-C F2-S 
Bernreuter. . .| 631+ College Men .90 .84 .88 .88 
(o = 82)| (¢ = 53) | (¢ = 60) |(¢ = 68)! 
Stagner...... 280 College Men .88 .80 .87 .85 
(@ = 77)| (o = 49)| (¢ = 48)| (o = 60) 
Bernreuter...| 145+ High School Boys .88 .78 .87 .87 
(o@ = 75) | (¢ = 50)| (¢ = 47)| (¢ = 57) 
Flanagan.....| 305 High School Boys .82 .80 85 84 
(¢ = 73) | (¢ = 53)| (¢ = 47)| (o¢ = 54) 
0 OE EE ee Sr Serer a paper .86 .78 
(o = 80)| (¢ = 59) 
i oe ee er os is ao cilia cénednntebes @ekenedbaeees .89 .83 
(¢ = 87) | (¢ = 62) 


























1 The figures in parenthesis give the standard deviations of the groups used in the calculation of 
the respective reliability coefficients. 

Although it can hardly be said that the present scales are as 
reliable as one would like for purposes of individual diagnosis or// 
prognosis, it should be remembered that a reliability coefficient of) 
0.81 indicates a correlation with the “‘true-scores”’ for the individuals | 
of 0.90.! And if, following one of the customary groupings, the | 
distribution of scores were divided into groups of seven per cent, 
twenty-four per cent, thirty-eight per cent, twenty-four per cent, 
seven per cent and the individuals in the groups assigned the letter- 
ratings A, B, C, D, and E respectively, just 33.7 per cent of the total 
group would be given the rating of a group adjoining the one to which 
they properly belonged and less than one-half per cent would be placed 
in a group which was two steps removed from their correct group. 
When the reliability coefficient is 0.90 those placed one step from their 
proper position include only 24.3 per cent of the group, and only one 
individual in ten thousand is placed as far as two steps from his proper 
position. 

It seems to the present writer that information which classifies 
individuals into five groups such that sixty-six to seventy-six indi- 
viduals in one hundred are placed in the group corresponding to their 





1 Kelley, T. L.: Statistical Method. The Macmillan Co., New York, 1924, p. 201. 










































~ o. . ; ion a i ~ ~ 3 ‘ wet 
‘ in +. . : 
a *% ~i Pe a = * 4 - — > 
P ante * ; Pa ee a aie ~ * ne - Po = s wa, 4 - 
“ “ <i he oe ts Saati > A oe! — FPR RR ¢ is 
are endl . a = Sane - et A —ae re, a 
% er te Se ~ to ae a - a> < ~ ae ee ae Des ee ‘ eS 
fms ey . H — > re ha ~ — —~ ~- ae : — = on Se & ae 


eae a eee ee 
“ : 
gins ene 
4 eee 


oa 


— a So cl eet ead 


Sipe ela! 


aap O52 5-"* 
eran st, re 
eS Ee SE SS 


5. ae. 


‘ 
ey 2 
SS 














———— 


650 The Journal of Educational Psychology 


“‘true-score” in the trait or combination of traits measured by the 
particular instrument, should prove very valuable in many types of 
work involving individual diagnosis and prognosis. 


INDEPENDENCE 


The statement is made: “‘ The traits as measured by the Bernreuter 
Personality Inventory are not independent, or pure, or specific.’”! 
That the four scales originally published do not give independent 
scores has been known for a number of years, but no place in the article 
do I find a refutation of the fact that the two recently published 
scales F1-C and F2-S are independent. The writers state; 


The intercorrelation of total trait I and total trait II is +.10. Since the 
keys correlated +.00 (on the average) it indicates that summing “yes,” 
“no,” and “?” weights within a trait spuriously raises the correlation by the 
mixture of heterogeneous data. 


Any “‘indication”’ of a ‘‘raised correlation’’ based on a coefficient 
of +.10 with a standard error of .10 seems to the present writer hardly 
to require an explanation. To use such a coefficient to impute a 
spurious action to ‘“‘the mixture of heterogeneous data,’’ (which 
mixture appears to be brought about by not marking all the items 
the same, e.g., marking some items “no” and some ‘?” instead 
of all ‘“‘yes’”’) seems somewhat extreme. Bernreuter found a correla- 
tion coefficient of +.11 (SE = .08) between the F1-C and F2-S scores 
in a group of college men and it is reasonable to suppose that these 
traits which showed practically zero correlation in a group of high- 
school boys may show slightly different correlations when different 
sorts of subjects are used. This is especially to be expected when 
it is remembered that the intercorrelations between the four original 
scales for the Bernreuter Personality Inventory were found to be 
different for high-school students than for college students or adults. 
Certainly, however, it can hardly be claimed that evidence has been 
presented that these two scales lack sufficient independence for all 
practical purposes. 

A further investigation of the effect of combining ‘‘yes,” ‘‘no,” 
and ‘‘?” scores was made by calculating the correlation coefficients 
between the F1-C and F2-S “‘average trait-response scores”’ for ‘‘ yes,” 
“no,” and “‘?”. The coefficients are shown in Table V together 
with the coefficient obtained by correlating total scores in the two 
traits. The coefficient of +.20 between the “average trait-response 


1 Lorge, I., et al.: Op. cit. 








he 
of 


or 


it 
le 


) 


—_— ll caerlCUrMhOllC™|S lO a 





Technical Aspects of Multi-trait Tests 651 


scores” for ‘‘yes’”’ on the different scales and the coefficient of .02 
for the total scores on these scales afford an excellent refutation of the 
statement that ‘‘summing ‘yes,’ ‘no,’ and ‘?’ weights within a trait 
spuriously raises the correlation by the mixture of heterogeneous 
data.” 


TaBLE V.—THE CORRELATION COEFFICIENTS BETWEEN VARIOUS ScORES OF ONE 
HuNDRED TWENTY-ONE HicH ScuHoo.t Boys DERIVED FROM THE F1-C AND 
F2-S ScaLEs FOR THE BERNREUTER PERSONALITY INVENTORY 


CoRRELATION 

VARIABLES CoEgFFICIENT 
Average ‘‘yes” F1-C with average ‘‘yes’”’ F2-S............. .20 
Average ‘“‘no”’ F1-C with average ‘‘no”’ F2-S.............. — .01 
Average ‘‘?” F1-C with average ‘“‘?” F2-S................ — .07 
ee ee I I eck vo ic ee nbasccdbsctevewes .02 


SUMMARY 


This paper has attempted to establish the validity of the following 
statements with respect to the fundamental issues of (1) correlation 
between the keys of a multi-trait instrument, and (2) the concept 
of reliability. 

Negatively.—1. The amount of correlation existing between the 
scoring-keys of a multi-trait instrument has not been shown to have 
practical significance. 

2. For some types of test, showing that the test may be divided 
into parts having rather low intercorrelations indicates practically 
nothing concerning the test’s reliability or consistency. 

Positively—1. The relations between the traits included in a 
multi-trait instrument may be obtained as correlation coefficients in 
such a manner that the effect of presence or absence of “correlation 
between the keys” (more precisely, the amount of correlation between 
relatively unique or ‘‘error” factors), is eliminated. 

2. The best evidence available indicates that the reliability of 
the various scales for the Bernreuter Personality Inventory is best 
represented by a reliability coefficient of approximately .85. In 
other words about seventy per cent of the subjects would be rated 
correctly and almost one hundred per cent rated either correctly or 
with an error of but one step on a five-point scale in whatever the 
particular scale measures. 

It hardly seems necessary to point out that this discussion has 
been limited to certain aspects of inter-item and inter-scale relation- 
ships of multi-trait tests and has omitted entirely any mention of 
that aspect which is paramount for practical purposes, validity. 


a 
. r 

x Balt oe 
Wetton wa. 


~ . i 
| e: . * 
> a 5 ee 
‘St Bente s ‘ c 


+ 
me 


= 


> a 


=. -~ ae 


Y 
blr 
, a? 
a) 
4 
- uf 
tee 
botip ig: 
aan 
of 
q e 
- b 
Pee ih | 
Ay 
v fb 
ear 
PSY 
y 9 
>. oes 
\ te 9 
rast 
Pee y ee 
t 
f e 
“ ; 
i 
13 





PERSONALITY TRAITS BY FIAT II: A CORRECTION 


IRVING LORGE 


Division of Psychology, Institute of Educational Research, Teachers College, 
Columbia University 


In the article “Personality Traits by Fiat II. The consistency of 
the Bernreuter Personality Inventory by the Bernreuter and the 
Flanagan Keys’’! Lorge reports in reference to the consistency of the 
Bernreuter Inventory by the Flanagan Keys a series of correlations 
which are in error. The error, moreover, was of such a nature as 
materially to change the argument concerning the usefulness of the 
Flanagan keys and the practical significance of factor analysis.? 

The correction affects the argument and the data from the para- 
graph in the middle of page 431 to the end of the article. The correc- 
tion should read as follows: 

Flanagan® has recently developed two statistically independent 
trait keys for the Bernreuter Personality Inventory through the 
technique of the Hotelling method of principal components. The 
intercorrelations between the Flanagan keys are +.04, —.00 and 
— .04 for the “‘ Yes,” for the ‘‘ No,” and for the “‘?” keys respectively, 
indicating that these traits are independent in the keys. Flanagan has 
named these traits tentatively ‘‘self-confidence”’ and “sociability,” 
which shall be referred to as F1-C and F2-S respectively. The same 
method of analysis employed for the Bernreuter keys was used to 


1 This journal, Vol. X XVI, 1935, pp. 427-434. 

2T am indebted to Dr. Flanagan for demonstrating the discrepancy between 
the results as reported in the cited article and those that he had obtained by dupli- 
cating my analysis for new material. The error was the result of a clerk’s for- 
getting the technique of algebraic summation, and my dependence upon chance. 
I checked fifty consecutive calculations which I found errorless. The clerk marked 
his calculations ‘‘checked.’”’ When Dr. Flanagan reported the discrepancy to me, 
Iichecked the entire Flanagan key data, and found fifty-eight errors. It was a 
blow to have one’s confidence in chance and in a clerk impaired simultaneously. 
This explanation does not, and cannot extenuate the error which must be debited 
entirely against me. 

I have checked all other calculations connected with the above article, and 
with the preceding article ‘‘ Personality Traits by Fiat I, The analysis of the Total 
Trait Scores and Keys of the Bernreuter Personality Inventory.”’ Journal of 
Educational Psychology, Vol. XX VI, 1935, pp. 273-278, only to find them free of 
error on all other points concerning the Bernreuter scores and keys. 


3 Flanagan, J. C.: Factor analysis in the study of personality. Stanford Univer- 
sity Press, 1935. 





652 





exal 





Te ae 


put 
the 


sé Vy 
Inv 


“ y 
tra 


,ontaarwnr | 


Lie ——-. aor ao 





ON 


lege, 


y of 
the 
the 
ons 
. as 
the 


ira- 


ent 
the 


en 
yli- 
or- 
ce. 


1e, 
8 
ly. 
od 


id 


of 
of 


r- 





Personality Traits by Fiat * 653 


examine the consistency of the Flanagan keys. The average trait- 
response score as determined by the ‘‘ Yes,” by the “No,” and by the 
“?”? responses of each of the one hundred three individuals was com- 
puted by application of Flanagan keys, F1-C and F2-S. In addition, 
the total trait F1-C scores and the F2-S scores were computed. 

Table III (corrected) presents the corrected correlation coefficients. 


TasLeE III (CORRECTED) 

The intercorrelations of the average trait scores (as computed separately for all 
“Yes,” for all ‘‘ No,” and for all ‘‘?” responses) and the total trait scores by apply- 
ing the Flanagan independent trait keys developed for the Bernreuter Personality 
Inventory: For 103 educated adults where 

F1-C Yes, F1-C No and F1-C ? refer to the average trait score determined by 
“Yes,” “No” and ‘?” responses respectively, and F1-Total refers to the total 
trait scores. The F2-S rubrics refer to similar computations for the F2-S key. 








2 3 4 5 6 7 8 
F1-C | F1-C | F1-C | F2S | F2S | F2S | F25 

No ? Total | Yes No ? Total 
= sh eer .83 .07 .87 —.05 | —.06 .00 
Sa ll .80 —.01 | —.03 | —.05 
i |... Aree .12 .09 | —.02 | —.15 
i oe Loe eek Eh (Odden. Danaea b aeons © aetes .05 
eS ec yo Me es oy E wens .84 .18 .73 
coe eu M. gee Blows & denen B whede .10 .75 
re Se Ter err Ser) Meer es sere .37 
8. F2-S Total........ 


























It is not only evident that the Flanagan keys are independent, but 
also that the Flanagan keys are consistent. This independence and 
consistency satisfy two of the three desiderata necessary within the 
instrument prior to the demonstration of its validity. Concerning 
the third desideratum, the reliability is approximately .86 for F1-C 
and .78 for F2-S, which is more than adequate for group separation, 
but which is short of the reliability of .94 considered requisite for 
“problems involving individual classification.”' Lengthening the 
inventory, and/or eliminating the relatively non-contributive ‘?” 
responses from the total trait score may increase the reliability to meet 
Kelley’s criterion. 

Clearly, Flanagan’s use of the Hotelling component analysis has 
made a positive contribution to the practice of test-construction. 





1 Kelley, T. L.: Interpretation of Educational Measurements, p. 211. 








~~ 
2g eS Pe, 


TL I ARTE SS GET SIA 


ee 
- eS 


+a —_ F , oo 
Signs Vit Z f+ 
aay» 


oom 


eee a 


o-_ 


rs. 


i 


Se EE hee i= BS 


e 


ae 


Se 
———- 


Se 


¥ - 

por —" ’ 
SAE ae Aye! 
= > 


654 The Journal of Educational Psychology 


The various factor methods in so far as they allow for the computing 
of independent factor keys may open the way to simplification of keys, 
_ and the elaboration of the concept of trait-purity. Flanagan has 
_ opened the Bernreuter Personality Inventory to validation. The 
| Hotelling analysis does not obviate the requirement of logic in the 
. understanding of personality traits. It is beyond the scope of 
the Hotelling, or any other factor analysis, to yield the name of the 
trait. The trait can only be evaluated against psychologically signifi- 
cant and logically determinate, criteria based upon human behaviors. 

Flanagan has provided the Bernreuter Personality Inventory with 
a set of independent keys which results also in consistent scores. 
If the reliability can be increased to meet Kelley’s criterion of .94 all 
that need now be done with the Bernreuter Personality Inventory 
before recommending it for clinical use, is to find out what the Flanagan 
keys measure. The validity is still indeterminate. 











8, 
al] 


in 





THE NATURE VERSUS NURTURE PROBLEM 


Part II. Tue Contrisutions or NATURE AND NURTURE TO 
INDIVIDUAL DIFFERENCES IN INTELLIGENCE 


FRANK K. SHUTTLEWORTH 
Yale University 


The history of any fundamental problem reveals a continuous 
process of developing the theoretical formulation of the problem, of 
collecting data, of refining the formulation of the problem, and of 
seeking more crucial data. A previous paper’ was devoted to a 
more adequate formulation of the nature versus nurture problem, to 
a catalogue of conditions and limitations necessary to this formula- 
tion, and to an exposition of the implications of possible solutions. 
This paper attempts a more adequate analysis of the available data 
bearing on the contributions of hereditary and environmental differ- 
ences to the variance or to the individual differences in intelligence. 
Such an attempt should serve four purposes: first, to illustrate what 
seem to the writer to be the most suitable statistical methodologies; 
second, to point out limitations in the available data and thus to 
provide direction for the collection of more adequate and more crucial 
data; third, to provide a closer approximation to an adequate solution 
of the problem; and fourth, to provide a more solid foundation for the 
development of certain implications of a solution of the problem. 


THE AVAILABLE DATA 


For the purposes of this paper data of Holzinger, Freeman, and 
Newman on identical twins? and of Barbara S. Burks on foster children 
and a control group of true children* will be employed. Selection 
of the results of the Stanford study of foster children rather than the 
results of the Chicago study is made for two reasons: Selective place- 





: Shuttleworth, Frank K.: ‘‘The Nature Versus Nurture Problem. Part I. 
Definition of the Problem.”’ Jour. Educ. Psychol., Vol. XX VI, 1935, pp. 561-578. 

? Holzinger, Karl J.: ‘The Relative Effect of Nature and Nurture Influences on 
Twin Differences.” J. Educ. Psychol., Vol. XX, 1929, pp. 241-248. 

* Burks, Barbara S.: The Relative Influence of Nature and Nurture Upon Mental 
Development; A Comparative Study of Foster Parent-Foster Child Resemblance and 
True Parent-True Child Resemblance. The Twenty-seventh Year Book of the 
National Society for the Study of Education, Part I, pp. 219-316. 

655 





656 The Journal of Educational Psychology 


ment and adoption seems less probable and, most important, a control 
group of true children equated as to environmental differences is 
available. Some use will also be made of the preliminary results of 
Alice M. Leahy on a foster and control population of Minnesota 
children. ! 

The data reported by Holzinger on the resemblance of identical 
twins is extremely scanty consisting only of a raw correlation of .88 
between Stanford Binet IQ’s. In order to make this datum serve 
the purposes of this paper the following quite arbitrary assumptions 
are made: (1) That the diagnosis of the twin identities is correct, 
(2) that the reliability of the Stanford Binet IQ for the range of talent 
involved is .91, (3) that the population consists of native-born, white, 
non-Hebrew children of North European stock, and (4) that the true 
standard deviation of the IQ’s of these children is 13.75 IQ points. 
The first of these assumptions is probably correct, the second repre- 
sents the best guess of Truman Kelley,? the third is largely immaterial 
and makes the population comparable with that of Burks, and the 
fourth is a matter of computational convenience. These assumptions 
merely permit a complete and somewhat more precise solution of the 
problem. | 

The details and nature of Burks’ study of foster and true children 
are well known. The more significant facts are as follows: (1) The 
population consists of native-born, white, non-Hebrew children of 
North European stock, (2) the standard deviation of the IQ’s of the 
foster and control groups of children are 15.09 and 15.13 IQ points, 
(3) the reliability of the lopped form of the Stanford Binet is .83 
giving true standard deviations of 13.75 and 13.78 IQ points, (4) the 
foster and control groups are matched for environmental differences, 
(5) the evidence is that selective factors in the placement and adoption 
of the foster children are negligible, (6) the multiple correlation 
corrected for attenuation between the IQ’s of the foster children and 
several measures of environmental differences is .42, and (7) the 
comparable correlation between IQ’s and environmental differences 
for the control group of true children is .61. Amplifications of certain 
of these aspects of the data will be presented later. 





1Leahy, Alice M.: ‘‘A Study of Adopted Children as Method of Investigating 
Nature-Nurture.” J. Amer. Stat. Assoc. Proc., Vol. XXX, 1935, pp. 281-287. 

2 Kelley, Truman L.: The Inheritance of Mental Traits. Chapter 23, pages 
423-443 in Psychologies of 1930. Worcester: Clark University Press. 








| 
is 
yf 





The Nature versus Nurture Problem 657 


STATISTICAL METHODOLOGIES 


The statistical methodologies center around the path coefficient 
technique which was developed by Sewall Wright!:* in connection with 
studies of the relative influence of hereditary and environmental 
differences on the piebald pattern of guinea pigs more than fifteen 
years ago. So far, however, the technique has not been applied to a 
study of the influence of hereditary and environmental differences on 
man except in two inconspicuous instances. Burks, in the paper 
cited above, devoted a few pages to the technique and, misinterpreting 
the significance of parental intelligence in the complex of relationships, 
came to the misleading conclusion that the “contribution of environ- 
ment other than parental intelligence” amounts to only thirteen 
hundredths of one per cent of the IQ variance. Wright himself 
applied the technique to the data of Burks but in the process misquoted 
Burks’ findings and failed to notice the most significant fact that the 
control group of children was matched with the foster group for 
environmental variability.‘ 

Let 21, 22, . . . 2, be true IQ deviations from the mean of a popula- 
tion. Each deviation may be expressed as 7} =~ hi +ai+6@.°°° 
tn =h, +a, +e, in which h, .. . h, represent the amount of the 
z deviations to be attributed to hereditary differences, in which 
a; ... @, represent the amount of the z deviations to be attributed 
to accidents and intra-family environmental differences, and in which 
€; ... €, represent the amount of the z deviations to be attributed 
to inter-family environmental] differences. By definition r,, and Tea 
are zero. Squaring, summing, and dividing by n gives 


Oe? = on? + Ga? + oe? + 2Treone (1) 
. os . 2 e 
Rena? = 1.00 = 8, + 25, + 25, + SMe. (2) 


The coefficient of determination of z by e (d:..) or the percentage 
of the variance in IQ to be attributed to inter-family environmental 





1 Wright, Sewall: ‘‘Correlation and Causation.” Jour. Agric. Res., Vol. XX, 
1921, pp. 557-585. 

? Wright, Sewall: ‘‘The Theory of Path Coefficients.” Genetics, Vol. VIII, 
1923, pp. 239-255. 

* Wright, Sewall: ‘‘The Method of Path Coefficients.” Annals Math. Siat., 
Vol. V, 1934, pp. 161-215. 

4 Wright, Sewall: ‘“‘Statistical Methods in Biology.” J. Amer. Stat. Assoc. 
Proc., Vol. XXVI, 1931, pp. 155-163. 








658 The Journal of Educational Psychology 


differences is defined as d... = .?/c.”. The path coefficient to z 
from e (pz) is defined as the square root of the coefficient of deter- 
mination or pz. = o./c, and similarly for h anda. A solution of the 
general problem given in equation (1) can be obtained from the 
following formulae 





dz, = Por® ” -. (3) 
2 
doe = Paa* = =a (4) 
a." 
dane = Pare - o.? (5) 
2 . 
der = — (6) 


The solution of formulae (3), (4), and (5) is approached through the 
path coefficients. When 7, is zero 


a L(h+a+e)e —_ <a. ™ 
Now. Now. Cz 





(7) 


Dee = Tze 


and similarly for pz, and pz.c. When rf, is positive or negative, then 


- Tih+a+te)e = Dh. 
Now. Now. 





+ Dz-e = DerThe + Ds-e 


and 


Tee —~ TehThe 
1 — Thee 





De-e = Tee — Dz-tThe = (8) 
Solution of formula (8) gives 7,. and this in turn permits a solution 
of formula (6). This derivation of formulae for the path coefficient 
and subsequent derivations differ from the derivations presented by 
Wright, but those who have been trained in educational and psychologi- 
cal statistics should find them easier to follow. 


‘THE TWIN DATA 


It will be convenient to begin with an analysis of the data from the 
identical twins reported by Holzinger which will yield a measure of 
the contribution of accidents and of intra-family environmental 
differences to IQ variance. That is, we begin with a solution of 
formula (4). Figure 1 presents the essential relationships in which 
the straight arrows indicate the direction of causal forces while the 








The Nature versus Nurture Problem 659 


curved double headed arrows indicate that factors are related through 
prior common causes: z and y represent the pairs of identical twins, 
each pair has the same heredity (h) and the same inter-family environ- 
ment (e), but accidents and intra-family environmental factors 


represented by a and b are different. The logic of this simple situation 
has not always been clear. The resemblance between pairs of identical 
twins, in this case a correlation corrected for attenuation of .967, 
represents the combined and inextricable influence of both hereditary 














4763 Sta£07ivte facToRs fie4 foster SIBLines 


and of inter-family environmental differences. The degree to which 
this resemblance falls short of unity represents the influence of acci- 
dents and of intra-family environmental differences and this is the 
only unequivocal data yielded by the study of identical twins. 

We begin with the following known relationships: ¢, = oy; ra 
Tyh; Tes = Tye; Tea = Tya; Ca = 9b; Tha = TH = Tea = TH TF TH = 0: fa 
.967; and o, = 13.75 IQ points. 

Then 


Tzy = .967 = 


TAt+atelh+b+e) on? + 0.2 + 2rrono. 
Nowy s o2" 

_ on? + 00" + 06? + 2rone 

- = 








Re-hac* = 1.00 








660 The Journal of Educational Psychology 


Hence 
2 
Tey = .967 = 1.00 ——, = 1.00 — pew? = 1.00 —dz. (9) 


Cz 


Pea = 1Tsa = V 1 — Tey = .18166 


or the correlation between z and a is .18166. Since o, = 13.75 IQ 
points 


Then 


So = Pz.atz = 2.4978 IQ points. (10) 
Finally, 


— = 0330 
a i —Ten ™ « 


or 3.30 per cent of the variance in z is to be attributed to accidents 
and intra-family environmental differences. The influence of acci- 
dents and intra-family environmental differences for children in 
general is doubtless greater than these data suggest. 


THE FOSTER CHILDREN DATA 


Assuming that the above solution of formula (4) holds at least 
approximately for Burks’ population, an analysis of her data on foster 
children will yield in addition a solution of formulae (3) and (5). 
Figure 2 presents the essential relationships: z represents the true 
deviations of IQ’s from the mean of the foster population, a represents 
accidents and intra-family environmental differences, h represents 
hereditary differences and e represents inter-family environmental 
differences. By definition 7. and r.4 are zero. Let it be assumed for 
the moment that selective factors in the placement and adoption of 
these children are zero or negligible, 7.e., that r,, = 0. rez = .42 is 
the multiple correlation corrected for attenuation between IQ’s and 
several measures of environmental factors and let it be assumed for 
the moment that rzz = rz. Then 


oz? = on? + 0,7 + @,”. 
From the twin data we have 


Pz-a = Tea = .18166 
o. = 2.4978 IQ points 
dz.q = .033 or 3.30 per cent. 





9) 


)) 





The Nature versus Nurture Problem 661 


Analysis of the inter-family environmental differences gives 


‘Doe = Tre = .42 
o. = .420, = 5.775 IQ points 
2 
dee = =< = .1764 or 17.64 per cent. 
Cz 


Analysis of the hereditary differences gives 





on = Voz? — oq? — o,? = 12.226 IQ points 
= = 8892 


| a 


Perk = Ta = 


Q 
u 


on 
dz, = “+ haw .7906 or 79.06 per cent. 


This solution indicates that 3.30 per cent of IQ variance should 
be attributed to accidents and intra-family environmental factors, 
17.64 per cent to inter-family environmental factors and 79.06 per 
cent to hereditary differences. This solution, of course, represents 
only a minor refinement of the conclusions reached by Burks. 

Before presenting a solution of the contribution of 2r;.c,0, to 1Q 
variance among children reared in the families of their true parents 
two limitations of the data from the foster population require further 
consideration. 


SELECTIVE FACTORS IN THE FOSTER POPULATION 


The above solution of the relative contributions of hereditary and 
environmental differences to IQ variance depends on the assumption 
that r,. = 0 or that selective factors in the placement and adoption 
of these children are zero or negligible. That some selective factors 
operated for some of the Stanford foster children is clear from the fact 
that IQ’s estimated on the basis of the available records concerning 
the true parents of one hundred fifty-eight out of the three hundred 
seventy-two foster children (E’) showed a positive correlation of 
.14 with the foster mother’s mental age (#). Burks also reports a 
correlation of .18 between these estimated IQ’s and actual IQ’s (2). 
Correcting these values for attenuation gives the following approximate 
figures for the one hundred fifty-eight cases: rez = .197, ree = .246 
and rz = .42. These relationships are presented graphically in 
Figure 3. EE’ represents the factors concerning the child’s true parents 
which were used in estimating IQ and the question at issue is whether 
knowledge of these factors on the part of the placement agencies 








662 The Journal of Educational Psychology 


creates a spurious correlation between the quality of the foster home 
(Z) and the child’s IQ (x). Applying the path coefficient technique 
(formula 8) gives pz.z = .387 or the correlation between z and E 
when E’ is held constant while the variability of E remains as large as 
before is .387. ‘The difference between .42 and .387 is a measure of 
the amount of the selective factors involved in the placement and 
adoption of these children. The difference, however, may exaggerate 
the selective factors since less than half of the foster children are 
involved. Hence, the assumption is justified that selective factors are 
zero or negligible and that the correlation between the quality of 
the child’s native intellectual endowment and the quality of his 
foster home (r,-) is also zero or negligible. 


MEASUREMENT OF ENVIRONMENTAL FACTORS IN THE FOSTER POPULATION 


The above solution of the relative contributions of hereditary and 
environmental differences to IQ variance in the foster population also 
depends on the assumption that r.z = r., = .42 in which rzz is the 
multiple correlation corrected for attenuation between IQ’s and 
several environmental measures such as family income and mental 
age of foster parents. That is, it is assumed that the battery of 
environmental measures is adequate and that the reliabilities employed 
in correcting for attenuation are also adequate. This is a most impor- 
tant assumption because the whole of the solution rests on the adequacy 
of the correlation of .42. 

The data available for the testing of these assumptions are ex- 
tremely meager. A test, however, will serve to direct future research 
to a type of data the significance of which has not been recognized 
by students of the nature versus nurture problem and which should 
be far more crucial than any as yet collected. The test employs 
data showing the resemblance of foster siblings, that is, adopted chil- 
dren with different heredities reared together from an early age in 
the same family. Figure 4 presents the test situation in graphic 
form. The true IQ deviations from the mean are represented by z 
and y. The symbols h and h’ represent hereditary differences and a 
and b represent accidents and intra-family environmental differences; 
the intercorrelations of these four variables are zero. Let E represent 
the true scores on a battery of measures of the common environment 
of the pairs of foster siblings and let e represent the amount of the z 
and y deviations to be attributed to environmental differences. The 
correlations of E and e with a, h, h’, and b are zero. Now 








= ™"-_A 


The Nature versus Nurture Problem 663 
De Lh+a+e)(h’+b+e) _ oe" 





= dae. (11) 


7” Now, a,” 
Hence, if selective factors in the placement of foster siblings are zero 
and if the children were adopted at an early age, then r., gives directly 
the proportion of the IQ variance to be attributed to inter-family 
environmental differences. Such data can also be used to test the 
adequacy of a battery of measures of environmental factors since 


V rey = Poe = Vee. (12) 
V rey = TzE (13) 


then the equality of rzz and rz. would be established. 

The available data for the application of this test consist of IQ’s 
of only twenty-one pairs of foster siblings. Burks reports the correla- 
tion as .23.!_ Correcting for attenuation and taking the square root 
gives rz. = .53. The probable error of this correlation is at least 
+.11 and is apt to be as high as +.17, so that the practical value of 
this correlation for the purpose of testing the adequacy of rzz = .42 
is very small. The test, however, lends some small support to the 
accuracy of the .42 correlation and this in connection with the methods 
employed by Burks in obtaining this correlation justify the assumption 
that TeE = Tze. 


Hence, if 


THE TRUE CHILDREN DATA 


We come now to an aspect of the problem which so far has not 
even been explored, namely: The contribution of the factor 2r,.cn0,. 
to IQ variance which involves the solution of formula (6). Three 
simple facts concerning the control group of children reared in the 
families of their true parents permit a solution of this problem. First, 
the correlation corrected for attenuation between IQ’s and a battery 
of environmental tests is .61 whereas the comparable correlation 
in the foster group is only .42. Second, the control group is identical 
with the foster group in respect to variability of IQ, that is, o. = soz, 
the actual values being 13.75 and 13.78. The slight discrepancy will 





1 Similar data were reported on seventy-two pairs the correlation corrected for 
attenuation being .37 in the Chicago study of foster children, but the significance 
of the data was apparently not realized. Freeman, Frank N., et al: The Influence 
of Environment on the Intelligence, School Achievement, and Conduct of Foster 
Children. The Twenty-seventh Year Book of the National Society for the Study 
of Education, Part I, pp. 103-217. 






























> pig ws 3 ee a ail Z “ ~~ Se 
oe a nent decay a : eS a ee 
Se ee eb oper Se a ee SS =}. > eG .- > aes a 7 - Fe 
~ Shaper ri ere SS ; aie Sa = - - : 


























664 The Journal of Educational Psychology 


be ignored. Third, the control group of true children is identical 
with the foster group in respect to environmental variability. Con- 
trol and foster families were matched for locality, types of neighbor- 
hood, and occupational field of father. The variability of the control 
fathers and mothers is almost identical with the variability of foster 
fathers and mothers in respect to mental age, age, and school grades 
completed. The variability of the control homes is almost identical 
with the variability of the foster homes in respect to the Whittier 
ratings and the culture index. These data are summarized in Table I. 
That is, oz = soz. These three facts give the following basic 
relationships where the subscripts c and f identify the control and 
foster groups. 


2 = soz = 13.75 IQ points (value calculated directly) 
ox = sox (See evidence cited above and Table I) 

&. = so, (from above equalities and formulae (11, 12 and 13) 
oe =f e = dre = sdee = .1764 (from above equalities, value cal- 
Ge fs culated from foster population) 

fo = 70g = 2.4978 IQ points (from the equality of ., and ;o, and 

from formula 10) 

Vee = Dae + PzirTre = .61 (formula 8, but note that yr.. = pz. = .42) 

TasiE I.—StTanpDARD DEvIATIONS OF ENVIRONMENTAL VARIABLES IN THE FOSTER 
AND ContTROL Homes; Data or Burks 











Standard deviations in 


Original standard foster homes relative 











Environmental deviations to control homes 
variables 

Foster Control Foster Control 
Mental age, father.............. 2.6 3.0 87 100 
Mental age, mother............. 3.0 2.8 107 100 
ES ss iia 46 e000 a aer 7.1 7.2 99 100 
Ee 6.1 5.3 115 100 
Education, father............... 3.9 4.0 97 100 
Education, mother........... ig ats 3.2 2.9 110 100 
Lk ra 1.9 2.3 83 100 
IEEE POOP LT OP EEE 4.2 4.3 98 100 
Combined standard deviations'... nee: poets 282.9 282.8 


















1 This is simply the square root of the sum of the squares of the relative stand- 
ard deviations. The correspondence is extraordinarily close. The foster homes, 
however, are more variable with respect to family income reflecting a higher 
average income and the fact that the distribution of incomes is markedly skewed. 








~~ —_— —_ — (ee a" «6G 





The Nature versus Nurture Problem 665 


From these data we may calculate p., and 7. by solving the 
following equations simultaneously 





clze = Dre + Pa-hThe (14) 
on? . Ge! , Ce , Mie 
clshae? = 7 + vet + oe? + as ‘ (15) 


Substituting known values in these equations gives 


61 = 42 + Dealne 
1.00 = pea? + .033 + .1764 + 2rnep2a.42. 


Solving for the unknowns gives the following results 


Pern = .19436 
The = .23919. 


From which we may also determine 


dz, = .63100 
Tz = Deh + Pz-eThe = 89481 
on, = .794360, = 10.922 IQ points 


dem, = 2Mis _ 1596, 


oc,” 





Which completes the solution of the problem to a first approximation. 
The variance in IQ among children reared in the families of their 
true parents is to be accounted for as follows: 63.10 per cent to heredi- 
tary differences, 17.64 per cent to inter-family environmental differ- 
ences, 3.30 per cent to accidents and intra-family environmental 
differences, and 15.96 per cent to the joint contribution of hereditary 
and environmental differences or to the correlation of .239 between 
the quality of native endowments and of environments. 

Attention is called to two features of this solution: first to the fact 
that the standard deviation of IQ’s to be attributed to hereditary 
differences is only 10.92 points whereas in the foster population the 
comparable figure is 12.23; and second, to the low correlation of .239 
between the quality of native endowments and of environments. 
These features are two aspects of the same phenomena, but they require 
separate consideration. Discussion and interpretation of the low 
correlation between endowments and environments will be presented 
at a later point. 


666 The Journal of Educational Psychology 


SOLUTION OF THE PROBLEM TO A SECOND AND THIRD APPROXIMATION 


In the collection of her data Burks matched the foster and control 
groups for variability of environmental differences. By accident it 
also happened that the two groups proved to be alike in respect to 
variability of IQ. From the fact that the two groups are alike in 
respect to IQ and environmental variability and from the fact that 
the correlation between IQ and environmental differences is .42 in 
the foster group and .61 in the control group, it follows that the 
standard deviation of the IQ’s to be attributed to hereditary differ- 
ences is only 10.92 points in the control group in comparison with 
12.23 in the foster group. This raises the important question of what 
would have happened if the two groups had been matched for heredi- 
tary and environmental variability instead of for IQ and environmental 
variability. If the two groups are to be matched for hereditary and 
environmental variability, two possibilities are available: Either 
o, = 12.23 IQ points in the foster group should be lowered to match 
o, = 10.92 points in the control group or vice versa. Choice of proce- 
dure depends on whether the foster group represents an abnormally 
large variability of hereditary differences or whether in the selection 
of the control children the variability of hereditary differences was 
abnormally restricted. Rather than decide this question it seems best 
to match in both directions. The resulting data together with the 
figures already reported are summarized in Table II. The starred 
figures represent the basic data, all other figures are derivative. 

Column one summarizes the data already reported on the foster 
group of children. Taking o, as 10.922 IQ points instead of 12.226 
IQ points and letting o, change to conform to the new value for o, 
gives the data of column two which may be regarded as a second 
approximation to a solution of the problem for foster children. Since 
o, is smaller it follows that environmental differences account for a 
larger proportion of the smaller variance in IQ. 

Column three summarizes the data already reported on the control 
group of true children. Taking o, as 12.226 and correcting rs. for 
restricted range by formula (186) of Kelley’s Statistical Method gives 
the data of column four which may be regarded as a second approxima- 
tion to a solution of the problem for children reared in the families 
of their true parents. Next, correcting r.,. for restricted range (21/01 = 
15.053/13.750) gives rz. = .6444, r,. = 3390, and o, = 15.393. 
Correcting rz. for restricted range once more (15.393/13.750) gives 


ihe 
_ a > 
™ - ~ 
os £. ~ 
2 . i — 
RN es 2 FS gee = ae ea ee eee = <5 Ne m 
Jamo RR. J RS te 6 R ‘ 


a E> “se eee) 
igre _ “4 a vee : ¥ 
. wat < shoe 
pa tes 
Sm 2 ge 


4 
Sie seats 

SE ee a ee 

ae ee See 


So a 
IST ca 


Be 


= wa SE 
5 pt EIS 


Sy eS 
A, ARATE 


ET SIT 
“Pee Sa Z 


SS 


fees eee 


Sa Noe 


a 





= et 
a 


pater 


see 


‘ - 
nee we 
= 
pe uEEs = 
=— 


69S TES eS RT 
—— 53 


ee 


pail 
thi 


s ai. a ita —_ — 





ul 


——— a ae UcelUL ee 





The Nature versus Nurture Problem 


the values of column five. 


667 


This third approximation to a solution 


of the problem for children reared in the families of their true parents 


represents close to maximum values for oz, Tze, and The. 


It is 


to be noted that the second and third approximations raise the 
correlation between endowments and environments from .239 to .266 


and .353. 


TasLe I].—SumMary oF FINDINGS FROM FosTER AND CONTROL Groups; Burks’ 
Data; BasED oN True Scores ALL CoRRELATIONS CORRECTED FOR 
ATTENUATION 





The foster group 


The control group 





Standard deviations, 


Matched to foster 




















correlations, path Matched 
coefficients and Original | to control] Original Seay es ao 
coefficients of data |groupfor| data : d 
i i = ee 
aeeeectaies hiatal 2658| 6528 
(1) (2) (3) (4) (5) 
Standard deviations: 
oz (intelligence)......... 13.750* | 12.605 13.750* | 15.053 15.457 
on Chewedlity)........cee0 12.226 | 10.922* | 10.922 | 12.226* | 12.226* 
oq (accidents, etc.)...... 2.498* | 2.498% | 2.498% | 2.498* | 2.498* 
o, (inter-family environ- 
ree 5.775 5.775* | 5.775* | 5.775* | 5.775* 
Correlations: 
rzr (IQ with heredity)... . 8892 . 8665 . 8948 .9142 . 9228 
Tza (IQ with accidents). . .1817 .1982 .1817 .1659 .1616 
tze (IQ with inter-family 
environment)......... .4200* .4582 .6100* .5995 .6528* 
Tre (heredity with inter- 
family environment... .0000* .0000* . 2392 .2658* .3529 
Path coefficients: 
Eras ehsswras is ee te . 8892 . 8665 .7944 .8122 .7910 
ids dnd aeebaee een .1817 . 1982 .1817 . 1659 .1616 
Rs ic et ide when ieee .4200 .4582 . 4200 . 3836 .3736 


Coefficients of determination (expressed in percentages). 


tions of different factors to IQ variance) 


doa (hevredity).......... 
dz. (accidents, etc.)..... 
d,.. (inter-family environ- 








Proportional contribu- 





79.06% | 75.08% | 63.10% | 65.97% | 62.57% 
3.30% | 3.98% | 3.30% | 2.75% | 2.61% 
17.64% | 20.99% | 17.64% | 14.72% | 13.96% 
0.00% | 0.00% | 15.96% | 16.56% | 20.86% 
100.00% |100.00% |100.00% |100.00% |100.00% 























668 The Journal of Educational Psychology 


Having illustrated the results which follow from matching the two 
groups in both directions it is in order to note that as a matter of 
actual procedure Burks matched the control group to the foster 
group. Hence, the data of column one represent the preferred solu- 
tion for the foster children. Column three represents a precise solution 
of the problem for the control group of children, but since these children 
were arbitrarily selected to match the foster group for environmental 
differences, the data of columns four and five probably represent some- 
what better solutions for an unselected group of children reared in 
the families of their true parents. 


DATA OF ALICE M. LEAHY 


In a recent article, obiter cite, Alice M. Leahy reported preliminary 
findings from a study of a foster and control population of Minnesota 
children. Essentially this is a repetition of Burk’s study and provides 
a most remarkable confirmation of her findings. Since the available 
data are preliminary and only a few zero order correlations uncorrected 
for attenuation are reported a solution of the problem for Leahy’s 
populations is impossible, but the general trend of the results is clear 
enough. In the foster population from fifteen per cent to twenty 
per cent of the IQ variance should be attributed to inter-family environ- 
mental differences, the comparable proportion for Burks’ foster popula- 
tion being 17.64 per cent. On the other hand the absolute size of the 
variance or of the standard deviation of 1Q’s to be attributed to 
inter-family environmental differences should be considerably smaller 
than in Burks’ foster population. Two features distinguish the control 
population of Minnesota children: First, the standard deviation of 
IQ’s is much larger than in the foster population; and second, the 
multiple correlation corrected for attenuation between IQ and environ- 
mental factors should be much higher than in Burks’ control pop- 
ulation. It follows that the correlation between endowments and 
environments in the Minnesota population should be much higher (.40 
to .50) than in the California population (.24 to .35); that the contri- 
bution of this correlation or of 27;.0,0, should account for a much larger 
proportion of the IQ variance (twenty per cent to thirty per cent) than 
in the California population (16.0 per cent to 20.9 per cent); and that 
the proportionate contributions of hereditary and of environmental 
differences to IQ variance should be smaller than in the California 
population. The apparently high correlation between endowments 
and environments suggested by the preliminary data of Leahy will 





two 
r of 
ster 
dlu- 
ion 
ren 
ital 
ne- 

in 


ta 
les 
dle 


———— a oS lST,le ee 





The Nature versus Nurture Problem 669 


be used at a later point to develop the implications of this aspect of 
the problem. 


SUMMARY OF FOREGOING SOLUTIONS 


We turn now to the task of making the foregoing solutions intel- 
ligible and to the development of certain implications of the data. 
Figure 5 presents the essentials of the matter in graphic form. The 
solutions are for California children of non-Hebrew, North European 
stock reared in the homes of their true parents and are based on the 
data of Barbara Burks (A, B, and C correspond respectively to columns 





CC) "eeorary OlrreRences 

GE AcciaenTs Ano InTRa-faruy Lnviron. OUF 
(0) Zwrer-farmy LNVRONMENTAL DUFFERENCES 
CORRELATION OF LNUARONMENTS & ENDOWMENTS 


fia > FacTaRrsS QETtRMINING DIFFERENCES 
WN INTELLIGENCE 


3, 4 and 5 of Table II). From 62.6 per cent to 66.0 per cent of the 
variance or individual differences in IQ is attributed to hereditary 
differences; from 2.6 per cent to 3.3 per cent of the variance is attrib- 
uted to accidents and intra-family environmental differences or to 
factors which make the environment of two children reared in the same 
family somewhat different; from 13.9 per cent to 17.6 per cent to inter- 
family environmental differences or to such factors as parental intel- 
ligence, family income, and cultural status of the home; and from 16.0 
per cent to 20.9 per cent to the joint contribution of hereditary and 
of inter-family environmental differences or to the correlation between 
endowments and environments or to the fact that there is a tendency 
for nature and nurture to work together to increase or to decrease 
the level of intelligence. Solution A in Figure 5 follows directly from 





670 The Journal of Educational Psychology 








the data reported by Burks. Solutions B and C result from matching 
the control population to the foster population for environmental 
and hereditary variability instead of for environmental and IQ 
variability. These solutions eliminate the influence of errors of 
measurement and are based on correlations corrected for attenuation. 


Total Range of IQs 
— LNs and Minus Two Sigma— 
Fhus and Pinus One Sigma 


- Niiaana a 
| Y y 


Ablained I.Q.3 


ES 


Fue I.Q's 


— 


Hereatfary Differences 
Lavironmental _ SERCES 


Lrrovs of Measurement 























= 








Accidents, etc. 


-_ 





Correlation 


F716. 6 Yariasniry oF LAs 
QUE 70 OUFFLRENT [ACTORS 


A second and for some purposes more meaningful representation 
of these data is in terms of standard deviations and inter-quartile 
ranges as in Figure 6 (Solution A of Figure 5, Column 3 of Table II). 
The first bar is a schematic representation of the total distribution 
of 1Q’s including errors of measurement: One child in a thousand will 
have a test IQ as low as fifty-six, one in a thousand as high as one 











The Nature versus Nurture Problem 671 


hundred forty-four; the standard deviation is 14.4 IQ points; and 
the inter-quartile range including fifty per cent of the cases is from 
ninety to one hundred ten. When errors of measurement are elimi- 
nated, the second bar gives the distribution of true IQ’s. The third 
bar represents the 1Q’s to be attributed to hereditary differences. 
This is the probable distribution of true IQ’s for a representative 
population of California children (representative, that is, of the 
hereditary differences existing in California) reared from conception 
to the age of testing in average and strictly identical environments. 
For the middle fifty per cent of children hereditary differences raise 
or lower the true IQ as much as seven points; one child in a thousand 
will be lucky enough to have a native endowment warranting an IQ 
of one hundred thirty-four and one in a thousand will be unlucky 
enough to have a native endowment warranting an IQ of only sixty- 
six. Similarly, the fourth bar represents the individual differences 
in IQ to be attributed to inter-family environmental differences. For 
the middle fifty per cent of homes environmental differences raise or 
lower the true IQ about four points; one home in a thousand may 
provide such a superior environment as to increase the IQ as much as 
eighteen points and one home in a thousand may decrease the IQ 
eighteen points. The fifth and sixth bars represent the individual 
differences in IQ to be attributed to errors of measurement and to 
accidents and intra-family environmental differences. The contribu- 
tion of the correlation between endowments and environments, how- 
ever, cannot be expressed in standard deviations and inter-quartile 
ranges and it has been necessary to draw the last bar representing this 
factor somewhat arbitrarily,! but it gives a correct visual impression 
relative to the other factors. These results must be regarded as 
approximations and subject to fairly large and indeterminate errors. 
A more precise approximation, probably increasing the contribution 
of environmental differences to IQ variance, can only be obtained by 
the study of a considerable number of foster siblings. 


IMPLICATIONS FOR ENVIRONMENTAL INFLUENCES 


Given precise figures which represent at least an approximate 
solution of the problem, the implications of the data may be developed 
in a multitude of directions. Space can be devoted to only the two 
most important of these implications. 





1 The points spotted were derived from +/ 2r;,.o40,- 








672 The Journal of Educational Psychology 


The data of Burks indicate very clearly that inter-family environ- 
mental differences account for a much smaller proportion (from 
fourteen per cent to eighteen per cent) of the variance or individual] 
differences in intelligence than do hereditary differences (from sixty- 
three per cent to sixty-six per cent). Preliminary data of Leahy for 
Minnesota children amply confirm the general trend of Burks’ findings. 
The inferiority complex which many educators and environmentalists 

have created for themselves by the misinterpretation of these and 
similar data! is a most bizarre phenomena. It does not follow that the 
general level of the environment is a relatively unimportant factor in 
‘determining the general level of intelligence, but only that environmental 
‘differences are relatively small in comparison with hereditary differ- 

ences in determining individual differences in intelligence. Even if 

‘environmental differences accounted for zero per cent and hereditary 
differences for one hundred per cent of the individual differences in 
intelligence, it would still be true that the general level of the environ- 
ment would be a most important factor determining the general level 
of intelligence. 

~The significance of the small proportion of IQ differences to be 
attributed to environmental differences is most easily grasped by a 
look backward at the progress of civilization. Whether a short or a 
long view is taken, the increase in the general level of the environment 
is extraordinary. The proportion of illiterates in the population is 
now approximately one-fifth of what it was in 1870. The number of 
children attending high school has multiplied nearly ten times as fast 
as the population in the last forty-five years: The per capita produc- 
tion of industry in the depression year 1934 was more than four times 
that of 1864. Available data suggest that the purchasing power of 
wages multiplied more than five times from 1840 to 1933. The average 
child of today has the advantage of physical and medical care and of 
books, schooling, and intellectual stimulation which were not even 
dreamed of a hundred years ago. At the same time is it probable 
that the most superior homes in America today do not provide a 
quality of intellectual stimulation which is greatly superior to that 
provided by the homes which assisted in producing the genius minds 
of Leibnitz, Goethe, and John Stuart Mill. That is, the increase in 
the general level of the environment has been in large part a leveling 





1 For recent examples see: Freeman, F. S.: Individual Differences. New York: 
Holt and Co., 1934, p. 355. Sanders, B. S.: Environment and Growth. Baltimore: 
Warwick and York, 1934, p. 375. 















wee -_ we —S ee 





The Nature versus Nurture Problem 673 


up process. The curves at the top of Figure 7 translate these general 
observations into quantitative form. They assume that the intel- 
lectual stimulation received by the average child of today is on a par 
to that received by the most fortunate five per cent of the children of 
one hundred years ago, that the intellectual stimulation received by 
the most fortunate five per cent of the children of today will represent 
the average a hundred years hence, and that this increase in the 

































x ' 
: 4- gg? Percentile 
" . i | gnvironTs. a 
y Os 
ee | 
i: 
S 
= IA | 
. ! 
/40 + 99" Fercentile 
T 
120 = 
$ Joo |____ ont of REE Z¢ 
¥ nici ! tile 
ee st Percent 
es ee” aaah a 
Ni 40 - 
20 H 
IBIS SGIS 2O3ST 


4167 INCREASE iN 1Q FOR ASSUMED 
INCREASE iN LNVIRONMENT 


quality of the environment represents in large part a leveling up 
process. These assumptions appeal to the writer as reasonable, but 
the reader is at liberty to be more optimistic or conservative and to 
make them for fifty years ago and fifty years hence or for two hundred 
years ago and two hundred years hence. Let it also be assumed that 
the average level and variability of intellectual endowments has not 
and will not change during this two hundred year period. Then from 
these assumptions there follows the increase in the general level of 
intelligence also portrayed in Figure 7 (numerical data and certain 





. > 


teu 
114 


Ra. 
Seen ee 


: = Spa i : = 


Bia a OE S 


‘i 


Se 
Baia ae 


Se SET 


—— 


674 The Journal of Educational Psychology 


technical details are given in Table III). The estimated increase 
in the average Stanford Binet IQ is fifteen points for the past century 
and seven points for the next or a total indicated increase in the 
average IQ of twenty-two points. Since the variability of IQ differ- 
ences becomes smaller there is a very great decline from 38.8 per cent 
to only 1.2 per cent in the proportions of children with IQ’s below 


Taste II].—Cerrtain Estimates CONCERNING NatTurRE AND Nurture OnE 
HuNDRED YEARS AGO AND ONE HuNDRED YEARS HENCE 








One hun- The One hun- 
Variables dred years dred years 
present 
ago hence 
Basic data and assumptions: 
o, (SD of hereditary differences).......... 10.922 10.922 10.922 
M; (Mean of intellectual endowments). ...| constant | constant | constant 
o. (SD of accidents and intra-family)...... 4.163 2.498 1.499 
o. (SD of inter-family environmental differ- 
I 6 ao aka at wee Aaa hae ak we ee 9.625 5.775 3.465 
M. (Mean of inter-family environmental 
ts iran 6 Ge wewee een awh ewe 1 1 1 
Tre (r of endowments and environments)... .3798? . 2392 . 1462? 
Derivative data; correlations: 
rz, (IQ with hereditary differences)....... .8291 . 8948 . 9504 
rza (IQ with accidents, etc.).............. . 2368 .1817 .1247 
rze (IQ with inter-family environments)... . 7834 .6100 .4209 
Coefficients of determination (expressed as percentages) 
I ns iain sa Gaia daa wed 4g de Cee 38.59% 63.10% | 82.49% 
ane MI GOB). oc wc cdc casececesea 5.61% 3.30% 1.56% 
d,.. (inter-family environment)........... 29.97% 17.64% 8.30% 
| ee ee 25.83% 15.96% 7.65% 
Standard deviation of IQ’s................. 17.582 13.750 12.025 
EE ee a ee eee 85 100.0 107? 
Per cent of IQ’s one hundred forty and higher 1% 2% 3% 
Per cent of IQ’s one hundred twenty and 
Re a aks bend be gle annie ames 2.3% 7.3%- | 14.0% 
Per cent of IQ’s one hundred and higher..... 19.7% 50.0% 72.0% 
Per cent of IQ’s below eighty............... 38.8% 7.3% 1.2% 














7 eet es 
nee 


ie 
ie. 
ih 


1The 1935 average environment is assumed as plus two sigma of 1835, the 
2035 average environment is assumed as plus two sigma of 1935 or as plus 3.2 sigma 


of 1835. 


? From formula 186 of Kelley’s Statistical Method. 


* The regression equation is 2 = 2¢.r... For the past hundred years this gives 
2 X 9.625 X .7834 = 15.08 and for the next hundred years 2 X 5.775 X .61 = 


7.05. 


eig! 
wit 


the 
tar 
cel 


thi 











Case 
jury 


Ter- 
ent 
low 


INE 





The Nature versus Nurture Problem 675 


eighty, a very considerable increase from 19.7 per cent to 72.0 per cent 
with I1Q’s of one hundred or higher, a small increase from 2.3 per cent 
to 14.0 per cent with I1Q’s of one hundred twenty or higher, and a 
negligible increase with IQ’s of one hundred forty or higher. Although 
by assumption the hereditary differences remain constant throughout, 
the proportion of the individual differences to be attributed to heredi- 
tary differences increases from thirty-nine per cent to eighty-two per 
cent. Conversely, the contribution of environmental differences to 
IQ variance declines from thirty per cent to eight per cent, reflecting 
the leveling-up process and the elimination of environmental differ- 
ences. That is, given a rising trend to the quality of the environment, 
then the smaller the proportion of IQ variance attributed to environ- 
mental differences, the greater the probable increase in the average IQ 
to be attributed to environment. It is to be emphasized that these 
data are entirely speculative. The assumed increase in the general 
level of the environment portrayed at the top of Figure 7 has some 
foundation in fact and appeals to the writer as not unreasonable, but 
the assumed constancy of the average level and variability of native 
intellectual endowments is wholly arbitrary. As we shall see later 
from a consideration of differential birth rates, the probabilities are 
that the average level of native intellectual endowments is now from 
two to four IQ points lower than one hundred years ago. Such 
speculative results, nevertheless, may be justified on two grounds. 
First, they demonstrate the important propositions that the average 
IQ may be very considerably increased by raising the general level 
of the environment and that such an increase in the average IQ is 
in no way inconsistent\with, but rather follows from, data showing 
that environmental differences account for only a small proportion 
of the individual differences in intelligence. Second, the data suggest 
that empirical checks be applied by testing the offspring of the chil- 
dren who were given the Stanford Binet fifteen to twenty years ago. 


IMPLICATIONS FOR HEREDITARY DIFFERENCES 


Since hereditary differences are responsible for from sixty-three 
per cent to sixty-six per cent of the IQ variance it would seem to 
follow that eugenic procedures offer great promise for improving the 
quality of intellectual endowments and the general level of intelligence. 
Assuming a constant environment and an increase in endowments 
until the average corresponds to that of the most fortunate five per 
cent of today, then the indicated increase in the average IQ is nineteen 








ee Pe ae “ 
bois 5 ee 
ain aa aA x 





$1 

Ls 
Bie. 
\ t 
: 4 


676 The Journal of Educational Psychology 


points, whereas, on the opposite of these assumptions, improvement 
in the environment can increase the average IQ only seven points. 
The data, however, suggest that an eugenic attack will encounter 
difficulties which so far have not been anticipated. Essentially these 
difficulties center around the correlation between endowments and 
environments and the réle of parental intelligence in the complex of 
relationships. The implications of the data in relation to differential 
birth rates will also be developed. 

The most important datum for the exposition of implications con- 
cerning hereditary differences is not the correlation between the child’s 
native endowment and his IQ (.90 to .92), but the correlation between 
the child’s native endowment and his environment which includes 
the intelligence of his parents. Three solutions of this correlation 
based on the data of Burks are available: .24, .27, and .35. In addition 
the preliminary data of Leahy suggest a correlation of from .40 to .50. 
While such diverse figures may seem disturbing, the general trend of 
the implications is not materially altered since all of the values are 
low in comparison with other important relationships. Of the three 
values obtained from Burk’s data, the writer’s judgment favors .35 
as the closest approximation to the actual relationship among unse- 
lected children reared in the homes of their true parents and this value 
will be used in subsequent calculations. That is, the implications 
for hereditary differences rely on the solution of Burk’s data recorded 
in column 5 of Table II. Circle C of Figure 5 portrays the propor- 
tional contributions of the four factors to IQ variance according to 
this solution. For this solution the correlation between the child’s 
IQ and his native intellectual endowment is .92 and the correlation 
between IQ and a battery of environmental measures is .65. Lest 
this selection of data seem arbitrary, parentheses will be used occa- 
sionally to indicate the more important differences which follow from 
taking a minimum value (.24) or a maximum value (.50) for the correla- 
tion between endowments and environments. 

The correlation of .35 between endowments and environments is 
the multiple correlation corrected for attenuation between the child’s 
native intellectual endowment and four environmental variables 
including the Whittier index, father’s mental age, father’s vocabulary, 
and mother’s mental age. The comparable correlation between the 
child’s native endowment and parental intelligence is necessarily 
slightly lower, the correlation uncorrected for attenuation still lower 
(.32), and the correlation between native endowments and such crude 











nt 


ber 
Se 


ial 


The Nature versus Nurture Problem 677 


indices as occupational status a great deal lower. With a correlation 
of only .35, it follows that very little can be foretold concerning the 
native capacity of an individual child even on the basis of very exten- 
sive and exact information concerning the intelligence, education, 
and the economic and cultural status of his parents. When only 
crude indicators such as occupational status are available, it should 
be apparent that the data offer only slight support to the widespread 
assumption that the children of the professional, managerial, and 
proprietary classes are natively superior and that the children of the 
laboring, unskilled, and property-less classes are natively inferior. 

The significance of the low correlation of .35 between endowments 
and environments which is almost entirely parental intelligence, runs 
counter to a very wide misconception of the réle of parental intelligence 
in the complex of relations. The square of this correlation indicates 
that only 12.3 per cent (minimum 5.7 per cent; maximum 25.0 per 
cent) of the variance in parental intelligence is representative of the 
native intellectual endowment of the child. On the other hand Burks 
reports the correlation between parental intelligence and the Whittier 
index as .7653. It is probable that the multiple correlation of parental 
intelligence with the Whittier index, culture score, and family incomes 
is .80 or higher indicating that at least sixty per cent of the variance 
in parental intelligence is representative of environmental factors. 
Another way of approaching this situation is as follows: The correla- 
tion between IQ’s and four environmental variables including two 
measures of parental intelligence and father’s vocabulary is .65. 
The square of this correlation indicates that parental intelligence and 
other environmental factors account for 42.6 per cent of the variance 
in IQ. Of this proportion, 13.9 per cent is to be attributed to environ- 
mental differences alone, 20.9 per cent is to be attributed to the correla- 
tion of .35 between endowments and environments or to the joint 
and inextricable contribution of hereditary and environmental differ- 
ences working together (see circle C of Figure 5), leaving only 7.8 per 
cent (minimum 3.6 per cent, maximum not available) to be attributed 
to hereditary differences alone. It follows that parental intelligence 
is only in small part an index to the child’s heredity. In larger part 
it is an index to the child’s environment. Or, children resemble their 
parents in intelligence not primarily because superior parents pass on 
superior intellectual endowments but even more because superior 
parents provide a high order of intellectual stimulation. If these 
results seem strange, it is to be noted that they conform to the realities 





| 
| 




















678 The Journal of Educational Psychology 


and logic of the situation. Parental intelligence is indisputably a 
continuing and important environmental factor, doubtless the most 
important environmental factor, stimulating or repressing the intel- 
lectual development of the child. 

An effective eugenic program presupposes methods of predicting 
the native intellectual endowments of the offspring of prospective 
parents. A prediction based on a very extensive battery of intelligence 
tests applied to the prospective parents plus knowledge of their educa- 
tion, and their economic and social status would eliminate only 6.3 
per cent (minimum 2.9 per cent, maximum 13.4 per cent) of the error 
involved in outright guessing concerning intellectual endowments. 
More concretely, let us define native intellectual endowment as the 
IQ to be expected if children were reared in average and strictly 
identical environments. Then, in the absence of any data whatever 
on a particular child the best guess would be that his ‘‘native 1Q” 
is one hundred. Such sheer guesses on a large number of children 
would be in error by 8.2 or more IQ points in fifty per cent of the cases. 
A prediction of native endowments based on a very extensive battery of 
tests applied to parents would reduce this error only .5 of a point 
(minimum .2, maximum 1.1). It does not, however, follow that an 
eugenic program can accomplish nothing. Denying parenthood to 
the five per cent with the lowest intelligence would raise the average 
intellectual endowment .46 IQ points in a generation. Similarly, 
denying parenthood to the fifty per cent with the lowest intelligence 
would raise the average intellectual endowment 3.43 IQ points in a 
generation (minimum 2.3, maximum 4.9). These are small gains 
in the light of the fact that environmental differences alone in the 
middle fifty per cent of homes raise or lower the child’s IQ as much 
as 3.90 points. If account is taken of the fact that these calculations 
are based on true scores and correlations corrected for attenuation, 
the actual gains will be smaller than indicated. It is improbable that 
society will tolerate the segregation or sterilization on eugenic grounds 
of five per cent of the population when such small gains result. If, 
in addition, an eugenic program takes account of other factors such as 
hereditary diseases, defects, and disabilities it should be apparent 
that such a program must look for its fruition in the long, long centuries 
of the future. 

While these data place severe restrictions on the promise of an 
eugenic program, it is to be noted that segregation or sterilization may 
be justified on other grounds. Denying parenthood to the five per 








Ss a eS 


The Nature versus Nurture Problem 679 


cent with the lowest intelligence, the least education, and the poorest 
economic status, would automatically eliminate the five per cent of 
poorest environments and this alone would raise the average by .62 IQ 
points per generation. This factor plus the gain from improved 
endowments indicates a total gain in a generation of 1.08 IQ points! 
(minimum 1.01, maximum not available). 

We turn now to the implications of the data in relation to differ- 
ential birth rates. Assume that all families are divided into three 
equal groups according to a very extensive battery of intelligence tests 
and assume that the upper third average 2.1 children per family, the 
middle third average 2.8 children per family, and the lower third 
average 3.3 children per family. Such a differential birth rate quite 
obviously tends to lower the average IQ partly because native intel- 
lectual endowments are lowered and partly because 40.2 per cent 
of the children are reared in the poorest third of the environments 
while only 25.6 per cent are reared in the best third of the environments. 
Calculation indicates that such a differential birth rate lowers the 
native intellectual endowment .69 IQ points per generation, that the 
differential environmental influence lowers intelligence .92 1Q points 
per generation, and that the combined influence of the two factors 
lowers intelligence 1.61 IQ points per generation (minimum 1.51, 
maximum not available). There is cause for deep concern here, but 
it is to be particularly noted that the environmental influence of the 
differential birth rate is more important than the hereditary influence. 





1 The three regression equations are as follows: 


Zu = Yorurne = .1075 X 12.226 XK .3529 = .4638 IQ points 
Le = YorPrn = .1075 X 15.457 X .3736 .6208 IQ points 
Le = Yorrze = .1075 X 15.457 XK .6528 = 1.0847 IQ points 


in which z,, Zz», and z, are the estimated z deviations due to hereditary differences, 
to environmental differences, and to both as measured by a battery of environ- 
mental measures including parental intelligence and in which y is the mean sigma 
deviation from the mean when the lowest 5 per cent of the population is eliminated. 
These régression equations were used to estimate the component contributions to 
IQ variance of parental intelligence. Let y = 1.0, then, z, = 4.314, z, = 5.775, 
and zy» = 10.090. Now, z,y* = (zu + 2)? = x? + 2zu27, + 2,2 and dividing 
these by oc,” gives the percentages cited in the text. 

2 Stating the problem in terms of assumptions permits us to disregard a host of 
minor complicating factors. The figures cited are based on 1929 birth statistics 
and represent the writer’s best estimate of the average number of surviving children 
in completed families when occupation is taken as an index of intelligence. 





a i at at iat ata! asians 
~ 2 —_ 


| 








- - - 2 <= = ane. 
—— = ee a a et ES ee ES oe 
Fail a a, a oe fears) Ser TR i ° - 
om = Bie eee ag ee eee =>, Pe. = g - F- + 


ee 


ett a ie ees 
rae od 


es 


in ee 


So SAR NT ae m 


cq 
i 
a, 
‘ia 
\ 





a ee 


ee 


3 ers. 


The Journal of Educational Psychology 


IMPLICATIONS FOR FUTURE RESEARCH 


The most important implication of this analysis for future attacks 
on the nature versus nurture problem is the suggestion that data be 
collected on pairs of foster siblings having different heredities but 
reared together in the same foster homes. Data on identical twins 
reared together in the same family provide a measure of the contribu- 
tion of accidents and of intra-family environmental differences, but, 
aside from this, the value of such data has been very greatly exagger- 
ated since their resemblance is to be attributed to the joint and 
inextricable contributions of both the same heredity and the same 
inter-family environment. Only the resemblance of foster siblings 
and of identical twins reared apart can give an approximation to 
unequivocal data. It is to be noted that data from identical twins 
reared apart are subject to precisely the same limitations as foster 
sibling data. In both cases placement at an early age is imperative 
and the influence of selective factors must be controlled. Foster 
sibling data have the very important advantage that such data should 
be easier to collect. Burks found twenty-one pairs and Freeman 
found seventy-two pairs without apparently searching for them, 
whereas at the last accounting only twenty-seven pairs of identical 
twins reared apart had been found. It follows that it should be 
possible for one investigator to administer the same tests to all pairs 
under comparable conditions, whereas the available data on identical 
twins reared apart have been collected by many different investigators, 
working under different conditions, and using different tests. Finally, 
data on foster siblings have two important advantages in comparison 
with batteries of environmental measures for the purpose of deter- 
mining the contribution of environmental differences to variance: 
It should be possible to obtain more accurate reliabilities for the tests 
applied to the foster siblings than for the environmental measures, 
the resemblance of foster siblings measures the cumulative influence 
of environmental differences whereas a battery of environmental 
measures does not. 

The suggestion has also been made that we look forward in the next 
ten years to testing the offspring of children who were given the Stan- 
ford Binet fifteen to twenty years ago. Any institutions or school 
systems which have considerable numbers of IQ records for the period 
prior to 1920 might well explore the possibilities of locating these 
individuals for the purpose of testing their offspring. If the influence 








t Se or we 


ball — ' ~ 





The Nature versus Nurture Problem 681 


of differential birth rates is to be considered, it will be important not 
to discard the IQ’s of those individuals tested prior to 1920 who have 
no offspring and either to test all the offspring of a given original 
testee or to weight the IQ’s of such offspring as can be tested by the 
total number of offspring. That is, if a comparison is made only 
between IQ’s of parent and one offspring, the writer would anticipate 
an average increase of from one to three IQ points in a single genera- 
tion as measured by the Stanford Binet; whereas, if differential birth 
rates are considered it might turn out that the losses due to this factor 
would wipe out the gains due to the presumptive increase in the level 
of the intellectual environment. 








THE RELATIONSHIP OF NUMBER OF HOURS OF 
STUDY TO SCHOLARSHIP 


E. G. WILLIAMSON 


University of Minnesota 


Students who are low in college scholarship often explain their 
predicament by saying, ‘‘I can’t understand why my grades are so low. 
I try hard and study all the time; but it doesn’t seem to do any good.” 
Such statements should be received skeptically by faculty counselors, 
not because the student is dishonest, but because he is in no position 
to make a valid estimate as to the number of hours he studies each 
week. This is another case where general estimates are likely to be in 
error. Students remember the few hours when they studied and fail 
to compare these few hours with the many hours they give to leisure 
and miscellaneous activities. Then too the hustle and rush of student 
activities and classes creates the illusion of long hours of study. The 
student is doing something every minute of the day, so he reasons 
that he must have studied many hours. 

Such an illusion was firmly fixed in H. T. who had been doing ‘‘C”’ 
and ‘‘D” work in her courses during her two years in college. She had 
the superior rating of eighty-seven percentile in the Minnesota college 
aptitude test but only eighteen in high-school scholarship. She had 
taken part in college dramatics and was also active in campus politics. 
She insisted at first that she was giving plenty of her time to studies 
and that she was working hard. But when she had kept an accurate 
record of her hours of study for one week and saw that she had studied 
only thirteen hours and fifty minutes she began to see that her standard 
of “‘plenty of time” was inadequate for college students. Moreover, 
when she did study, her efforts were undermined by day-dreaming and 
worries about conflicts she was having with her parents. 

Freshmen in the College of Science, Literature and the Arts 
recorded for one week the actual distribution of their time. A record 
form for the recording of various activities and the time devoted to 
each was prepared. This form is modeled after the Northwestern 
University time sheet as reported by L. B. Hopkins.' A total of two 
hundred fifty-seven freshmen, one hundred seventy-four men and 
eighty-three women, recorded, for the week of October 24-30, 1929, 
the time they gave to study, classroom and laboratory, social engage- 
ments, participation in campus activities, leisure, and outside work 
and home duties. 

682 





ma 


wi. 


ww STF WH ES YF TlhlUhhThU!D 





Hours of Study and Scholarship 683 


The average (or fiftieth percentile) student claimed he spent 26.4 
hours per week in study, twenty-one hours in the classroom and 
laboratory, eight hours in social engagements, 1.5 hours in campus 
activities, 6.5 hours in outside work and home duties, and 10.3 hours 
in leisure time activities for this one week. These data vary from 
those found by Crawford for Yale freshmen for the week of April 12-19, 
just following mid-semester examinations. Our data were collected 
for the week just preceding fall mid-quarter examinations. Crawford? 
found that the average Yale freshman gave 18.3 hours to study, ten 
hours to activities, 36.4 hours to leisure and recreation, and fifteen 
hours in the classrooms. 

A further comparison of the Minnesota data regarding hours of 
study with results from other institutions is made in Table I. The 
Yale data were collected in the spring after examinations. Further- 
more, as other studies to be cited have shown, students of superior 


TaBLE I.—ComPaRING THE Hours or Stupy or StTupENTs IN DIFFERENT 
COLLEGES AND UNIVERSITIES 








Science, literature and Syracuse | University 
‘ Yale? . : 
arts freshmen, Univer- (apeing) University | of Iowa 
sity of Minnesota omg freshmen® | freshmen‘ 
1928-1929 (fall) (fall) (fall) 
as ia ick nc eae 27.09 18.3 24.0 31.4 
Standard deviation. 8.79 7.6 6.0 9.0 
Aas 257 221 450 130 

















ability tend to study fewer hours than students of lower ability; Yale 
freshmen in the past have been shown to be intellectually superior to 
a group of Minnesota freshmen.* No reason for the surprisingly large 
average number of hours of study at the University of lowa is known. 

Many surveys of the average number of hours students give to 
study are reported in terms of what the individual does on an 
“average”? day. Our data have been analyzed for the two hundred 
fifty-seven freshmen by the days of the week; the first five days of the 
week are alike or almost alike in respect to the number of hours of 
study of the average freshman. The average hours of study for 
these days are: Monday 4.8; Tuesday 4.4; Wednesday 4.8; Thursday 
4.8; Friday 4.2. There is a marked decrease in the average on Satur- 
day, 2.3, with an increase on Sunday to 3.9 hours. 




















































































684 The Journal of Educational Psychology 


The significance of number of hours of study per week for college 
scholarship has been studied by two methods. In the first place, the 
average number of hours of study per week for the upper, middle, and 
lower third levels of ability according to intelligence test, has been 
computed. Jones and Ruch report a significant decrease in number of 
hours of study with an increase in level of academic intelligence as 
determined by a battery of four tests. This increase in number of 
hours of study is accompanied by a decrease in scholarship. In other 
words, fifty students below the fifteenth percentile in intelligence 
studied on the average of 34.4 hours for this particular week under 
investigation but received an average of only 7.0 grade-points for the 
semester. Fifty freshmen between the forty-third and fifty-sixth 
percentile in intelligence studied an average of 33.7 hours per week 
and received an average grade-point of 23.8. Fifty students above 
ninety percentile in intelligence studied an average of 29.1 hours per 
week and received 46.1 grade-points on the average. If we assume 
that the study records are valid, it seems that for students inferior in 
mentality to study more hours each week does not compensate for 
their handicap of lower mental capacity. 

The second method of determining the significance of hours of study 
for scholarship is the use of coefficients of correlation. Three exten- 
sive studies using this method have been reported in the literature 
(Table II), namely those of May® at Syracuse University, Crawford? 
at Yale, and Jones and Ruch‘ at the University of Iowa. May had 
four hundred fifty freshmen in educational psychology classes estimate 
the number of hours per week that were given to study. These 
judgments were made at the beginning and again at the middle of the 
first semester of college. The reliability of these estimates of time of 
study was 0.86, a surprisingly high reliability. These same freshmen 
were given the Miller Mental Ability Test and the Dartmouth Com- 
pletion of Definitions Test, the raw scores on the two tests being added 
to get the student’s total score. The relation between intelligence and 
the first semester scholarship of these freshmen was +.60. Hours of 
study correlated +.32 with scholarship. When hours of study is kept 
constant by partial correlation technique, the relation between intelli- 
gence and scholarship is .80. By multiple correlation, May found that 
the relation between scholarship and the combined effect of intelligence 
and hours of study is .82. 

Crawford collected data regarding the actual distribution of 
students’ time by having Yale undergraduates record what they did 





OO DO DO mde “ne © 


> 


ee 





Hours of Study and Scholarship 


685 


TasBLeE II].—Txe RELATIONSHIPS BETWEEN COLLEGE SCHOLARSHIPS, MENTAL 
Tests, AND Hours or Stupy as DETERMINED BY INVESTIGATION IN DIFFERENT 








UNIVERSITIES 
Four hundred | Two hundred One hundred One hundred 
fifty Syracuse | twenty-one Yale| five University |thirty University 
University University of Minnesota of Iowa 
freshmen, freshmen, freshmen, freshmen, 
(fall) 1923 (spring) 1926 (fall) 1929 (fall) 1928 
Gri ssces . 60 .28 . 65 .69 
a .32 .00 — .06 — .28 
a — .35 —.15 —.20 —.4l 
et .80 .28 . 65 . 66 
_ .70 .04 -1l .0044 
aS —.72 —.15 —.22 — .32 
R1(23).... .82 . 28 . 66 .69 

















* Note: (1) Quarter or semester scholarship. (2) Mental test rating. (3) 
Total number of hours of study for one week. 


for the week of April 12-19, 1926. Our interest is particularly centered 
in the two hundred twenty-one freshmen of his total group. His 
results do not agree with those obtained by May at Syracuse, all of his 
coefficients of correlations being lower. The correlation between hours 
of study and scholarship is 0.00 as compared with the coefficient of 
+.32 found by May. When intelligence is partialled out of this 
relationship, the coefficient is reduced to .04, whereas the corresponding 
partial for May’s data is .70. Furthermore, Crawford’s multiple r 
is only 0.28 as compared with May’s 0.82. Crawford and May 
explain these differences as probably due to different methods of com- 
puting scholarship, honor points by May and average grades by 
Crawford; greater heterogeneity in academic ability between students 
at the two universities which would lower the correlation; and to the 
fact that May’s subjects estimated the time spent in study while 
Crawford’s students actually kept a record of this time. An addi- 
tional reason may be of some importance. 

Crawford made his study in the spring during which period of the 
college year it is possible that students are not motivated scholastically 
to the same extent that they are in the fall or winter. A lower degree 
of scholastic motivation, together with the influence of restriction in 
range of ability, may account for Crawford’s lower correlations. The 
possible influence of motivation factor is shown by the smaller number 
of hours of study at Yale. 

















686 The Journal of Educational Psychology 


This latter hypothesis, in part at least, is supported by the results 
of similar studies at Iowa and Minnesota. The results of the Iowa 
study on one hundred thirty freshmen in the fall semester more nearly 
approximate those of May than those of Crawford, with the exception 
of the coefficient of —.28 between scholarship and hours of study. 
This latter coefficient probably is due to the negative relation between 
intelligence and hours of study (—.41) since the partial coefficient 
between scholarship and hours of study with intelligence held constant 
is .0044 at Iowa. The Iowa multiple coefficient is 0.69 as compared 
with 0.82 at Syracuse and 0.28 at Yale. 

A similar analysis of the significance of hours of study for scholar- 
ship was made at the University of Minnesota. Because these data 
were collected for the week just prior to mid-quarter examinations, it is 
probable that these students studied more than was typical for other 
weeks in the quarter. Although this week may not be representative 
for the entire fall quarter, yet these data have the advantage of being 
collected at a time when the students are highly motivated scholastic- 
ally. If mid-quarter examinations are important in determining the 
final quarter grades and if hours of study have any relation to grades, 
in this situation the relationship should be revealed near its maximum 
extent, since the factor of motivation is controlled to a fairly satis- 
factory extent. 

The reliability of 0.73 for the time sheet data was computed by 
the split-half method, sum of hours of study of Monday, Wednesday, 
Friday, and Sunday versus sum of hours of Tuesday, Thursday, and 
Saturday. The split-half reliability for the college aptitude test was 
computed by Paterson and Drake as 0.95. The reliability of college 
marks has been computed by various investigators by correlating the 
first semester or quarter’s average grades with those of the second 
semester or quarter. Toops’ found that the average reliability coeffi- 
cient of first and second semester for seventeen colleges was 0.66. 
The reliability of grades (first semester versus second) of one hundred 
thirty-nine girls in the college of Science, Literature, and the Arts at 
the University of Minnesota for 1919-1920 was found by Van Wagenen 
to be 0.78.8 

Unfortunately complete data were available for only one hundred 
five of these two hundred fifty-seven freshmen and the following 
analysis is made upon this number of students. In the main the 
results tend to agree with the Iowa study rather than the Yale and 
Syracuse studies. The coefficient between hours of study and scholar- 





—cr or = = -— NS FF WD 


A ee ed 


’ ws 


Hours of Study and Scholarship 687 


ship (honor point ratio for the fall quarter) is only —.06, and the 
hours of study is correlated —.20 with academic intelligence, as 
measured by the Minnesota college aptitude test. The partial 
correlation between number of hours of study and scholarship with 
intelligence constant is 0.11, as compared with .0044 at Iowa, .04 at 
Yale, and .70 at Syracuse. The multiple correlation for the Minnesota 
data is 0.66. 

Because of the factors of greater range in academic intelligence 
and the probable better scholastic motivation of the Minnesota 
students it is probable that these coefficients approximate the ‘‘true”’ 
relationships of intelligence, study hours, and scholarship to a greater 
extent than do the other studies. If this assumption be true, we may 
conclude that, beyond a minimum number, varying with the level of 
academic intelligence, the hours of study a student gives to his scho- 
lastic work have less significance than academic intelligence. A stu- 
dent of low ability will need to study more hours in order to do passing 
work. But an increase in the number of hours of study by this 
student of low ability will not necessarily result in much higher 
scholarship. Piling up the number of hours of study will not com- 
pensate for low academic ability. Counselors who attempt to motivate 
students scholastically need to keep in mind that beyond a total of 
say, twenty or thirty hours of study a week, an increase in hours of 
study a week will not improve the student’s scholastic standing and 
may actually result in emotional disturbances. Experience in counsel- 
ing students leads one to conclude that a minimum of eighteen to 
twenty hours and a maximum of thirty to thirty-five hours of study 
a week should permit students to get the grades that their academic 
aptitude makes possible. Within these limits, improvement in study 
skills, reading habits, and interest and pride in studying for the 
sake of being well trained professionally are the important factors to 
note in any attempt to improve the scholarship of students. 


REFERENCES 


1. Hopkins, L. B.: ‘Personnel Work at Northwestern University.” Journal of 
Personnel Research, Vol. I, 1922, pp. 277-288. 

2. Crawford, A. B.: Incentives to Study. Yale University Press, 1929, p. 20, 
Table V. 

3. Haggerty, M. E.: “‘Student Ability and Its Measurement.” Problems of 
College Education. University of Minnesota Press, 1928, p. 225, Table XIII. 

4. Jones, L. and Ruch, G. M.: “‘Achievement as Affected by Amount of Time 
Spent in Study.” 27th Yearbook, National Society for the Study of Education, 
Part II, 1928, pp. 131-134. 











688 The Journal of Educational Psychology 


5. May, M. A.: “Predicting Academic Success.” Journal of Educational Psy- 
chology, Vol. XIV, 1923, pp. 429-440. 

6. Paterson, D. G. and Drake, L. E.: Reliability Analysis of the 1929 Form of the 
Minnesota College Ability Test. Unpublished report. 

7. Toops, H. A.: “The Status of University Intelligence Tests.” Journal of 
Educational Psychology, Vol. XVII, 1926, pp. 22-36 and 119-124. 

8. Van Wagenen, J. M.: ‘‘Some Results and Inferences Derived form Use of Army 
Tests at the University of Minnesota.” Journal of Applied Psychology, 
Vol. IV, 1920, pp. 59-72. 








of 


ny 


WHAT SHOULD BE INCLUDED IN EDUCATIONAL 
PSYCHOLOGY? 


NOEL B. CUFF 


Eastern Kentucky State Teachers College 


I, INTRODUCTION 


Casual comparisons and careful analyses of textbooks show there 
is but little agreement as to content and organization of introductory 
courses in educational psychology. The major elements included 
may be poached from courses in general psychology, tests and measure- 
ments, mental hygiene, adolescence, or in methods of teaching.! For 
a course in educational psychology, a basal text prepared for almost 
any department may be used. But comparative studies of textbooks 
on educational psychology also show that they are likely to have a low 
degree of community of content. One study,’ for example, shows that 
Starch and Gates agree on less than thirty-three per cent of the mate- 
rials they embody. 

Furthermore investigations show that the interest for the majority 
of students in items taught in introductory courses in psychology is 
likely to lie between ‘‘little”’ and ‘‘moderate’’—often nearer the former 
than the latter.2 Teachers being taught to be teachers may be repelled 
by such rubrics as nonsense-syllables, auditory theories, animal learn- 
ing, Weber’s law, threshold differences, and other knowledge not 
closely related to school conditions. They may quote with glee the 
definition that educational psychology consists in ‘putting what 
everybody knows in language nobody can understand.”’* Or they may 
meekly regurgitate relatively useless and stereotyped content and then 
relegate it to a mental ash heap. 

Such findings have caused some psychologists to admit that psy- 
chology courses are often ‘‘unpsychologically’’ organized and taught. 
Hence, Seashore‘ quotes: ‘‘Thou that teachest others, teachest thou 
thyself?’’ And Watson’ suggests: “‘ Physician, heal thyself.”’ 

It would seem that as a matter of professional self-respect psy- 
chologists should attempt to determine what psychology is of the most 
worth in the training of teachers. An attack on the problem with as 
scientific techniques as would be used in determining how to run a 
bassoon factory or in building a curriculum in other fields might 
yield a large amount of agreed upon material which could be care- 
fully articulated. 


689 











































The Journal of Educational Psychology 


II, PROBLEMS 


and references are preferred by instructors? 


III, INVESTIGATION 


used in a number of books. Others were very original. 


with the request: 


an Introductory Course in Educational Psychology.”’ 


tional psychology is neither introductory nor educational. 


two books in general psychology, one in educational, one on adoles- 


The problems involved in this study are: (1) What chapter headings 
do psychologists think—from their study, experiments, and experi- 
ences with teachers in training—should be used to indicate the content 
of an introductory course in educational psychology? (2) Is there a 
concensus as to the content which should be included in educational 
psychology? (3) What types of organization of exercises, experiments, 


The method used in this study consisted in determining the major 
topics or chapter headings in eighteen recent textbooks in educational 
psychology. The tabulation of chapter headings in texts published 
during the last five years resulted in a total of over two hundred fifty 
different headings—there was however variability and overlapping as 
to the items discussed under a given heading. Some of the titles were 


It was 


possible however from a rough analysis to reduce these titles to seven- 
teen median or modal headings. These frequently used chapter 
headings together with a few of the other most used or recent synony- 
mous headings (a multiple-choice grouping of about seventy titles) in 
each instance were arranged in a question list. This list, as indicated 
in Table I, was sent to professors of educational psychology in institu- 
tions accredited by the American Association of Teachers Colleges 


‘Please check in the column after the following chapter headings 
each chapter heading that you think should be used in a textbook for 


It is of course generally conceded that data from questionnaires are 
of doubtful reliability. But ninety-seven replies were received from 
one hundred sixty-four requests and they, with very few exceptions, 
suggested that the people who did the answering were interested in 
the problem, professionally minded, and had checked the list as accu- 
rately as possible. Comments such as these were frequently added 
to the blanks: ‘‘Good wishes for your study.” “I shall be glad to 
aid you in any way and hope I will have the pleasure of reading your 
finished work.”’ ‘I am returning list unanswered because our educa- 


We use 





gs 


nt 


al 


What Should Be Included in Educational Psychology? 691 


cence, and one on tests and measurements for a year’s work.” ‘Iam 
checking your list without reference to the text (Gates) I use.”’ Obvi- 
ously such reactions indicate serious attempts to evaluate the material 
in the question list. 

The chapter headings most frequently used in the textbooks 
examined are:* Introduction, Nature and Nurture, Growth, Motiva- 
tion, Emotion, Mental Hygiene, Intelligence, Individual Differences, 
The Learning Process, Guidance of the Learning Process, The Hygiene 
of Work, Transfer of Training, The Higher Intellectual Responses, 
The Measurement of Achievement, Elements of Statistical Methods, 
and Personality and Character. 


TaBLE I.—CHAPTER HEADINGS AND CONTENT PREFERRED BY PROFESSORS FOR AN 
INTRODUCTORY COURSE IN EDUCATIONAL PsycHOLOGY* 








. Heading Content 
Chapter headings preference 
rank 
per cent 

The field of educational psychology.................... 75 4 
SY Gs CIEE, ccc cc ccccccecccscecsveces 66 2.5 
I Ca Siu cdae cele cccscescseecwcas 59 11.5 
I ds sac entwentede sates 75 5 
FPoolings, attitudes, emotions. ............ccccccccccess 72 11.5 
OB da hake 68 94 hs e454 the 59 9.5 
Intelligence and its measurement...................... 52 9.5 
Individual differences and the school................... 74 2.5 
oi. oe Cee cep OR iwa sess ose wbews 66 8 
Economy and efficiency in learning.................... 74 6 
Factors influencing learning......................4.0:- 72 1 
a a as a 5 Ndi a eed 61 7 
Reasoning, imagining and problem solving.............. 56 14 
me mmenemromeent Gf NGRFMINE. .. ... 0.5. ccc cc ccccceses 59 15 
SUNENOT GONE GUNMAMOD. .. 2 ccc ccc ccc ccsccecccess 56 13 











* Tabulations by Naomi Kalb and Evelyn Reynolds. 


Table I shows the chapter headings which teachers of educational 
psychology think should be used. Unless the same item in a given 
group was checked on fifty per cent of the lists returned the entire 
group has been omitted. This explains the absence of a chapter 
heading on “Statistical Methods.” It follows then that the titles 
included in Table I are picked by over half the teachers. The head- 





* Most of this work was done by students in Advanced Educational Psychology 
courses. 


ke ee oe 














692 The Journal of Educational Psychology 


ings of the first set were as follows: Introduction, The Nature and 
Scope of Psychology, The Methods and Subject-Matter of Psychology, 
and The Field of Educational Psychology. Notwithstanding the 
fact that the title ‘‘Introduction” is the one most frequently used in 
current textbooks, another heading, “‘The Field of Educational 
Psychology,” is considered the best in seventy-five per cent of the 
replies—as is shown by Table I. The other preferred topics or chapter 
headings, together with the per cent of teachers who select them, are 
revealed in Table I. 

The content rank in Table I shows that more of the people who 
replied thought that the chapter heading entitled “‘ Factors Influencing 
Learning” or One of the roughly synonymous headings should be 
included in educational psychology than was true of any other chapter 
heading or set of headings—hence, it ranks one. The content rank 
also points out that more replies omitted “The Measurement of 
Learning” and all comparable headings than was the case relative to 
any other major topic included in the results presented in this study. 
Consequently this heading ranks fifteen. It is also evident from 
Table I that the content selected by the largest number of teachers— 
regardless of the exact chapter titles—is indicated by such headings 
as “Factors Influencing Learning,’ “‘Heredity and Environment,” 
“Individual Differences and The School,” “‘The Field of Educational 
Psychology,” ‘Growth and Development,” and so on in rank order. 


TasLeE I].—PREFERRED TypES AND ARRANGEMENT OF REFERENCES, EXERCISES, 
AND EXPERIMENTS 


References should be included: 
IIE TS TE TCS Ee Pr OTe 25 
oe ne Or OO NED... . sc cvccsccescancesbasionses 68 
ee kk bee ewan scebe ses eesece’s 2 
cece eeeeek aw cewduoe 8 
Questions or exercises should be: 
ee eS ee Pe ee 7 
ig a Se le ele ede hd head bennwenee ds 8 
(3) Both objective and discussion.................seeee005 82 
Exercises should be included: 
(1) At the end of the chapters... .........cccccccccccccees 62 
Ce NE WE I EI. ov oo ccih'c cw awe eden cceneessccs 2 
I La ok oN i sana ee eeent us 20 
(4) As a separate series of tests... 2.2.2.2... cece eee eens 18 
Experiments to be performed by students should be included: 
Se A ET WE I IIIB og cc cc ccccreeccccssccencese 61 
i  . . . cccnbeastesscceeseaspese 8 


en al gb bh Fee bees seeRtNee eee 34 


a— iin. Mar aie aud 





What Should Be Included in Educational Psychology? 693 


Table II presents the arrangement and types of references, exer- 
cises, and experiments preferable to teachers. The data show that 
sixty-eight per cent of the group had rather have references at the end 
of achapter. They also indicate that teachers favor “‘both objective 
and discussion” questions (eighty-two per cent) placed “at the end 
of the chapters”’ (sixty-two per cent). There is less agreement as to 
the need for experiments than as to where they should be included; 
but sixty-one per cent vote for experiments to be included “‘at the end 
of the chapters.” 


IV. CONCLUSIONS 


The following conclusions seem to be justified from the analyses and 
data involved in this study: 

1. Only about twenty per cent of the chapter headings most fre- 
quently used in recent educational psychology textbooks are the ones 
that professors would choose. 

2. Certain chapter titles are preferred by over fifty per cent of the 
professors of educational psychology. ‘“‘The Field of Educational 
Psychology,” for example, is the choice of seventy-five per cent of the 
teachers of this subject. 

3. There are more teachers who believe that the content suggested 
by such a chapter heading as “Incentives and Motives” should be 
incorporated in educational psychology than there are who think that 
subject-matter relative to “‘The Measurement of Learning”’ should be 
included. 

4. If given the opportunity, teachers would elect—placed at the 
end of the chapters—references, both objective and discussion ques- 
tions, and experiments. 

5. Teachers of educational psychology apparently come more 
nearly to agreeing with reference to the chapter headings and content 
of the introductory course than do recent textbooks. 


REFERENCES 


1. Bolton, F. E.: ‘‘Overlapping of Courses in Education.” Educ. Adm. & Super., 
Vol. XIV, 1928, pp. 610-623. 

2. Hartmann, G. W.: ‘‘The Measurement of the Relative Interest Value of Repre- 
sentative Items taught in Elementary Psychology.” J. Educ. Psychol., 
Vol. XXIV, 1933, pp. 266-282. 

3. Remmers, H. H. and Knight, F. B.: ‘‘The Teaching of Educational Psychology 
in the United States.” J. Educ. Psychol., Vol. XIII, 1922, pp. 399-407. 





revit 














694 The Journal of Educational Psychology 


4. Seashore, C. E.: ‘Trial and Error in the Development of the Elementary Course 
in Psychology.”’ School & Soc., Vol. XXXIII, 1931, pp. 782-786. 

5. Watson, G. B.: ‘‘ What shall be taught in Educational Psychology?” J. Educ. 
Psychol., Vol. XVII, 1926, pp. 577-599. 

6. Werks, H. F., Pickens, H. D. and Randebush, R. I.: ‘‘A Comparative Study 
of Recent Texts in Psychology, Educational Psychology, and Principles of 
Teaching.” J. Educ. Psychol., Vol. X XI, 1930, pp. 327-340. 

7. Worcester, D. A.: ‘‘The Wide Diversities of Practice in first Course in Educa- 
tional Psychology.” J. Educ. Psychol., Vol. XVIII, 1927, pp. 11-17. 

8. Worcester, D. A.: ‘‘Teachers Problems and Courses in Educational Psychol- 
ogy.” Educ. Adm. & Super., Vol. XI, 1925, pp. 550-555. 


AN EXPERIMENT ON THE LAW OF EFFECT IN 
LEARNING THE MAZE BY HUMANS 


HOMER B. REED 
Ft. Hays Kansas State College 


The law of effect is often used with two different meanings, first, 
that an evaluation of the consequences of a response changes it in 
the direction of the organism’s dominant tendency and, second, that 
a response is strengthened if satisfaction follows it and weakened if 
annoyance follows it. Another frequent formulation of the second 
meaning is that pleasantness stamps in a right reaction and that 
unpleasantness stamps out a wrong one. According to Thorndike, 
the pleasantness or unpleasantness in question somehow acts back 
on the connections that produced the response and either strengthens 
or weakens them, pleasantness strengthening and unpleasantness 
weakening. When we speak of the law of effect in this paper, we have 
in mind the second meaning mentioned above. 

In this experiment the problem was to find out if it is possible to 
learn in a manner which is exactly contrary to the law of effect, that 
is, by punishing right movements and rewarding wrong ones. If it 
is, we have a case in which pleasantness stamps out a right movement 
and unpleasantness stamps in a wrong movement. Such a result 
would either prove the possibility of the law of contradiction or it 
would require a new interpretation of the law of effect. 

This problem was solved in codéperation with Mr. A. A. Lind, 
who was the experimenter and who built a maze which consisted of 
a wire threaded over a smooth board. In general, it may be described 
as an angular 7’ maze. All the subject had to do to solve the maze 
was to keep his fingers moving on the wire. The mechanism was 
supplied with dry cells and an induction coil, with which the experi- 
menter could give the subject a shock at any given point in the path 
by merely pressing a key. 

The subjects used in this experiment were seventy-eight college 
students. Before using them on the experimental maze, they were 
divided into six equal groups on the basis of time and errors required 
to learn a preliminary maze having a design quite different from the 
experimental maze. Each of these groups then learned the experi- 
mental maze according to a prescribed procedure. Some of these 
were definitely in accordance with the law of effect, some were con- 
trary to it, and at least one was neutral. 

695 











696 The Journal of Educational Psychology 


The following procedures were used: 

(A) Shock the wrong choice or shock errors. 

(B) Shock the right choice or shock corrects. 

(C) Say “right” for the wrong choice and ‘“‘wrong” for a right 
choice, also give shock for right choice. This is referred to as: Say 
“‘right”’ for errors and ‘‘wrong”’ for corrects. 

(D) No reward or punishment for right or wrong choices. 

(EZ) Say “right” for errors or wrong moves. Make no comment 
for right moves. 

(F) Say ‘‘right” for correct moves, but make no comment for 
others. 

Of the above procedures, the A method, shock errors, and the F 
method, say ‘‘right” for corrects, are in agreement with the law of 
effect. The D method, no reward or punishment, is neutral. The B 
method, shock corrects, the C method, say ‘“‘right’’ for errors and 
“‘wrong” for corrects, and the E method, say ‘“‘right”’ for errors, are 
contrary to the law of effect. The EZ method gives only verbal rewards 
for errors and the B method gives bodily punishment for correct move- 
ments. The C method gives verbal reward for errors, and both verbal 
and bodily punishment for correct movements. Theoretically, accord- 
ing to the law of effect, it should be impossible to learn by this method. 

The instructions given to each subject were as follows: 

_ “You are to thread this maze as quickly and as accurately as 
possible, having your eyes closed or looking away from the maze, and 
using only one finger tip in feeling your way. Should you experience 
an unpleasant feeling you must decide what to do about it without 
further questioning.” 

The criterion of learning was the ability to trace the maze once 
without error. A record was taken of the time, the number of trials, 
and the number of errors made to reach this goal. 

The average results in terms of time, trials, and errors are given in 
Table I. 

The F method, say “‘right”’ for corrects is the shortest. The A 
method, shock errors, is second. The E method, say “right” for 
errors, is third. ‘The D method, no reward or punishment, is fourth. 
The C method, say “‘right”’ for errors and “‘wrong”’ for corrects, and 
the B method, shock corrects, are fifth and sixth, respectively. The 
E, B and C methods are contrary to the law of effect. The E method 
ranks third, but the B and C methods, on the basis of time, make a 
poor showing. The interesting fact, however, is that the subjects 





nt 
Ly 


nt 


or 


or 
h. 
ad 
he 
od 





An Experiment on the Law of Effect 697 


can learn by this method, a fact which is a logical contradiction of 
the law of effect. On the basis of number of trials, the B method, 
shock corrects, is second best, while the C method, say “‘right”’ for 
errors and “‘wrong”’ for corrects, is still the poorest. On the basis 
of number of errors, these methods rank third and second, respectively. 
Their comparatively low rank on the basis of time is due, partly, to 
the fact that the subject hesitates to touch a “‘hot”’ wire and, partly, 


Tasie I.—AVERAGE Timez, TRIALS, AND Errors REQUIRED TO LEARN MAZE; AND 
AvERAGE OF Eaco MEASURE 



































M 
; M M 
Method time ou trials} ™ | errors | “™ 
seconds 

Ee EE MONEE. oc cccccenceeses 652 | 66.79 | 7.18 | 1.02 | 20.27 | 3.61 
(B) Shock corrects.............. 848 | 71.57 | 6.82 | 0.75 | 17.18 | 2.22 
(C) Say ‘“‘right” for errors, 

‘“‘wrong”’ for corrects........ 776 | 78.82 | 7.36 | 0.97 | 16.45 | 2.83 
(D) No reward or punishment ...| 676 | 97.75 | 7.00 | 0.82 | 26.36 | 2.54 
(EZ) Say “‘right”’ for errors....... 668 | 64.30 | 7.10 | 0.67 | 19.00 | 1.54 
(F) Say ‘“‘right”’ for corrects..... 566 | 40.78 | 5.91 | 0.57 | 13.82 | 1.60 

Tasie II.—Arrer First TRIAL 
aM M 
Method time oM oM 
seconds —_ 

TES SIE CT EE 484 | 53.96 | 15.72 | 1.05 
1... acs bec enebe dyn tas caest 541 | 67.83 | 12.45 | 0.72 
(C) Say “right” for errors, ‘‘wrong”’ forcorrects.| 566 | 66.82 | 13.18 | 0.90 
(D) No reward or punishment................. 550 | 76.00 | 21.36 | 2.63 
(3) Gay “rhb” Gow GrvOeS. 5... ccc ccccces 447 | 56.30 | 13.36 | 1.50 
(F) Say “‘right”’ for correcta. .............00.. 410 | 38.10} 10.00 | 1.29 

















to the fact that the procedure is contrary to his habits of learning. 
Usually correct responses are rewarded and incorrect ones punished. 
To reverse this procedure confuses the subject at first, prolongs 
his Jearning time, and increases his errors. But after he discovers the 
reversed interpretation of shock, and of the announcements “right” 
and “‘wrong,” he may improve very rapidly. To test this hypothesis, 
the average time and errors after the first trial are given in Table II 
for each method. 











698 The Journal of Educational Psychology 


In this table, the B method, shock corrects, ranks fourth on the 
basis of time; and second, on the basis of errors. The C method, say 
“‘right’’ for errors and “‘wrong”’ for corrects, on the basis of time, 
still ranks sixth. But on the basis of number of errors, these methods 
rank second and third, respectively. These results indicate that their 
poor showing on the basis of total time was due not so much to the 
alleged stamping-out tendency of unpleasantness and the stamping-in 
tendency of pleasantness as to the fact that the reversal of the usual 
procedure in learning confused the subject. If we ignore the first 
trial and base our judgment on the efficiency of the various methods 
on the number of errors we can say that punishing correct movements 
is a more effective method of learning than punishing wrong 
movements. 

If we compare the efficiency of the six methods on the basis of the 
ratio of difference to its standard deviation, we can say that, in most 
cases, there are no reliable differences between them. Out of seventy- 
five comparisons, there are only five ratios that are equal to or greater 
than three sigma. On the basis of total time, we can say that the B 
method, shock corrects, is inferior to the F method, say ‘‘right”’ for 
corrects, but it is not reliably inferior to any of the other methods. 
On the basis of total errors, we can say that the D method, no reward 
or punishment, is inferior to the F method, say ‘‘right’’ for corrects; 
. and also that the chances are ninety-nine out of one hundred that it 
is inferior to the B method, shock errors. On the basis of the total 
number of trials, no method is reliably superior to another, the highest 
ratio being 1.4 sigma. On the basis of total time after the first trial, 
no method is reliably superior to another, the highest ratio being 1.7 
sigma. On the basis of total errors after the first trial, the D method, 
no reward or punishment, and the A method, shock corrects, are 
inferior to the F method, say ‘‘right’”’ for corrects. The chances are 
ninety-nine out of one hundred that the B method, shock corrects, is 
superior to the A method, shock errors; and one hundred out of a 
hundred that it is superior to the D method, no reward or punishment. 
The C method, say ‘“‘right” for errors and ‘‘wrong”’ for corrects, is 
decidedly better than the D method, no reward or punishment. Per- 
haps the most significant results on the basis of the ratios of the 
differences to their standard deviations are first, the decided inferiority 
of the method of giving no reward or punishment; second, the superi- 
ority of punishing the right and of rewarding the wrong to giving no 
reward or punishment; and third, the superiority of punishing right 


An Experiment on the Law of Effect 699 


responses to punishing wrong responses from the standpoint of the 
number of errors. 

These results take on considerable significance from the standpoint 
of the law of effect. They indicate that it is possible to learn in a 
manner contrary to it, and that, instead of pleasantness strengthening 
right responses and unpleasantness weakening them, they may produce 
opposite results. In two of our methods, the subject experiences a pain 
at every right turn. If pain is unpleasant and if unpleasantness works 
back on the connections and weakens them, then the right responses 
should become fewer and fewer and eventually be eliminated, the result 
being that the subject fails to get through the maze. But this does 
not happen. In the B method, shock corrects, he gets through with 
fewer mistakes than in the A method, shock errors. Of course, we 
can still say that the pleasantness of having finished a series of unpleas- 
ant experiences is far greater than the total of the unpleasant experi- 
ences and that it strengthens the reactions leading to the finish far 
more than the unpleasant shocks along the way weaken the individual 
turns, but such a position is at least contrary to the experience of our 
subjects, and it would also seem that the conflict between the unpleas- 
antness of shocks and the pleasantness of ending them would, according 
to the law, make this method much less economical than methods in 
agreement with the law. But the facts are that the shock-corrects 
methods yields fewer mistakes than the shock-errors method. 

In view of this situation, it is probably safer to say that please :tness 
and unpleasantness, or satisfaction and annoyance as such, have 
nothing to do with the strengthening or weakening of a response. 
This, however, does not mean that they have nothing to do with 
learning. We saw that punishing right responses and rewarding wrong 
ones is decidedly better than giving no reward or punishment. This 
may be partly due to the fact that strong affective experiences “key 
up” the organism, make the subject more alert, and so increase his 
learning power, but we believe rather that pleasantness and unpleas- 
antness serve as cues for selecting the response that leads to the 
learner’s goal. If the learner once discovers that pain or the announce- 
ment “‘wrong”’ is a sign of the correct movements, he quickly selects 
his responses accordingly. The important fact, then, about pleasant- 
ness and unpleasantness is not the affective quality but the interpreta- 
tion with reference to the goal. If so, learning is not at all the 
physiological after-effect of satisfaction and annoyance, but rather the 
result of the interpretation of these qualities. I should say the same 





700 The Journal of Educational Psychology 


about the satisfaction coming from completing the maze successfully . 
Instead of this acting back on connections previously made, it is much 
simpler to think of this knowledge of success as enabling the learner 
to make up his mind what to do in the next trial to make a better 
performance. Again it is the interpretation of the result with reference 
to the next series of acts that is important. 





ly. 


ch 
er 
er 
ce 


THE RELIABILITY OF THE GOODENOUGH DRAW A 
MAN TEST AND THE VALIDITY AND RELIABILITY 
OF AN ABBREVIATED SCORING METHOD 


MOSHE BRILL 
Ohio State University 


The Goodenough Draw a Man intelligence test! attracted wide 
attention among both clinical psychologists and research workers in 
mental testing. Several studies primarily concerned with the validity 
of the Goodenough test were reported.2 The only study, besides 
Goodenough’s own work, on the reliability of the test is the one by 
Yepsen.* 

The purpose of the present study is to determine the reliability 
of the Goodenough test by the re-test method. This study is also 
concerned with the validity and reliability of an Abbreviated scoring 
method for the same test. This method has been described by the 
writer elsewhere.‘ 

Goodenough reported reliability coefficients by both the retest and 
the split-scale methods. She found a reliability coefficient of 
.937 + .006 by re-testing on the following day one hundred ninety-four 





1 Goodenough, Florence L.: The Measurement of Intelligence by Drawings. 
World Book Co., 1926. 

* Berrien, F. K.: ‘‘A Study of the Drawings of Abnormal Children.” Journal 
of Educational Psychology, Vol. XXVI, 1935, pp. 143-150. 

Earl, C. J. C.: ‘The Human Figure Drawings of Feeble-Minded Adults.” 
Proceedings and Addresses of the American Association on Mental Deficiency, 
Vol. XXXVIII, 1933, pp. 107-120. 

Hinrichs, W. E.: ‘‘The Goodenough Drawing in Relation to Delinquency and 
Behavior.” Archives of Psychology, No. 175, January, 1935. 

McElwee, Edna W.: “‘The Reliability of the Goodenough Intelligence Test 
Used with Sub-Normal Children Fourteen Years of Age.” Journal of Applied 
Psychology, Vol. XVI, 1932, pp. 217-218. 

Williams, J. H.: “‘ Validity and Reliability of the Goodenough Intelligence 
Test.” School and Society, Vol. XLI, May 11, 1935, pp. 656-659. 

Williams, M. L.: ‘‘The Growth of Intelligence as Measured by the Goode- 
nough Drawing Test.” Journal of Applied Psychology, Vol. XIV, 1930, pp. 239- 
256. 
*Yepsen, L. N.: “‘The Reliability of the Goodenough Drawing Test with 
Feeble-Minded Subjects.’”? Journal of Educational Psychology, Vol. XX, 1929, 
pp. 448-451. 

‘ Brill, M.: A Comparative Study of the Performance of Adjusted and Maladjusted 
Mentally Deficient Boys on Twenty-Two Tests and Scales. Ph. D. Thesis, School of 
Education, New York University, 1935. 

701 




































































































702 The Journal of Educational Psychology 


first-grade children. Concerning this method of establishing the 
reliability of the test, Goodenough states: ‘‘This, of course, means 
nothing more than that the performance of children on a task of this 
sort does not ordinarily undergo much change from day to day; but 
how much of this apparent stability is due to actual stability of the 
trait measured, and how much to mere memory transference or fixation 
of motor habit is a question which remains unanswered.’’! 

The second method used by her was that of dividing the scale 
into two independent sub-scales and then estimating the reliability 
of the total by means of the Spearman-Brown formula. Reliability 
coefficients were thus calculated for both sexes and all ages between 
five and ten years. One hundred cases were included in each group. 
The range of reliability coefficients found was between .639 + .015 
and .891 + .023. The average coefficient of reliability for ages 
five to ten years, taken separately, was found to be .77. Goodenough 
criticizes this method of determining the reliability of her scale. She 
states: ‘‘The objection to this method lies in the fact that it was not 
found possible to make such a division of points as to give approxi- 
mately equal weight, in each of the two sub-scales thus formed, to 
the several mental functions which may be assumed to be in some 
degree measured by the test (such as memory for items, sense of 
proportion, motor coordination, etc.) and at the same time keep the 
two scales independent of each other.’’? 

Yepsen reported a study in which the Goodenough Drawing Test 
was given three times with four days between each administration 
to thirty-seven feeble-minded boys between the ages of 9.0 and 18.2. 
He reported the following coefficients of reliability; between first and 
second administration .89; between second and third .91; between 
first and third .91. Yepsen also studied the amount of change from 
one administration to the following ones. He concluded: ‘‘The 
Goodenough tests can be successfully applied to feeble-minded subjects 
after the original administration with a high degree of reliability. 
Approximately fifty per cent of the cases will remain the same; twenty- 
five per cent will increase; twenty-five per cent will decrease. The 
variability will rarely exceed 1.0 years.’’* 





1 Goodenough, Florence L.: ‘‘A New Approach to the Measurement of the 
Intelligence of Young Children.” Pedagogical Seminary, Vol. XXXIII, 1926, 
p. 195. 

2 Ibid., p. 196. 

3 Yepsen, L. N.: Op. cit., p. 451. 


Reliability of the Goodenough Test 703 


























the The subjects for this study were the school children at the New 
= Jersey State Colony for feeble-minded males at New Lisbon, N. J. 
his The Goodenough Draw a Man test was administered in the regular 
out classrooms three times. Eighteen days intervened between the 
the first and second administration; and the third administration was 
on given twenty-five days after the second one. The drawings were 
scored and re-scored to eliminate errors, according to Goodenough’s 
ale instructions. Eighty-five boys were present during the first adminis- 
ity tration of the test; eighty-three during the second; and seventy-seven 
ity during the third. Of these, seventy-three boys were present during 
ren both the first and second administrations; sixty-seven during the 
up. first and third; and sixty-five during the second and third. 
15 
ges TasBLe I.—DisTRIsuTion oF GoopENOUGH ScorREs AND TEST-AGES IN THE First, 
igh SECOND, AND THIRD ADMINISTRATIONS 
she ; Administration 
not — Test age in 
yxi- nonin First Second Third 
to 
me 0O- 3 36— 45 2 1 1 
of 3 4-7 48- 57 5 3 3 
the 8-11 60— 69 8 5 8 
12-15 72- 81 6 21 18 
16-19 84-— 93 15 20 12 
est 20-23 96-105 18 12 10 
ion 24-27 108-117 11 6 7 
32. 28-31 120-129 7 2 7 
and 32-35 132-141 5 9 7 
36-39 144-153 7 4 2 
cen 40-43 156-156 + eo vee FS, 2 
‘om DS eckssesuk éneeeeeee 85 83 77 
The a éedeeees Score 20 18 18 
ate TA 96 90 90 
t ee nas «eka Score 16 14 13 
— TA 84 78 75 
ity- Mery even ske Score 27 24 26 
The TA 117 108 114 
OE ee ere Score 5.5 5.0 6.5 
TA 16.5 15.0 19.5 
the 
926, 


It appears from Table I that although the differences between 
the medians of the three administrations are statistically unreliable, 
there is a definite tendency for a lowering of scores or test-ages on 








704 The Journal of Educational Psychology 


each successive administration. The distribution of the second 
administration tends to greater compactness than that of the first 
one, and the distribution of the third administration tends to greater 
spread than that of the previous ones. 


Tasie IT.—Amount or CHANGE IN YEARS AND Monrtus FROM First To Szconp, 
First To Turrp, AND SECOND TO THIRD ADMINISTRATION 








Years and months I to II I to III II to III 

| es? Sra 1 

ne a a a 1 

+4:0 2 

eee ee SN elie eded eee hues ckboeeds 2 

+3:0 2 1 2 

+2:6 1 1 2 

+2:0 1 4 1 

+1:6 2 5 2 

+1:0 4 4 2 

+0:6 7 8 10 

0 20 16 21 

—0:6 11 7 11 

—1:0 10 10 3 

—1:6 5 3 6 

—2:0 2 2 

—2:6 3 1 

—3:0 2 2 1 

oe fe creer apes 1 

—4:0 i Sr nae 1 

i eas “Poway aang 1 
tila 56 ice ee habe saab aeeed 73 67 65 
Eee csc paths entcexeus —4:3 to +4:3|)—4:6 to +8:0/—4:0 to +4:6 
Median increase................. 0:9 1:0 0:9 
Median decrease................. 1:0 1:0 0:9 
Per cent no change............... 27.40 23.84 32.34 
Per cent between —1:0and +1:0.. 52.06 46.19 63.68 
Per cent above +1:0............. 16.44 23.84 18.48 
Per cent below —1:0............. 31.51 29.80 16.94 














A study of Table II indicates that the median increase and median 
decrease in test-age from one administration to the other do not 
exceed one year. In only one case was there an increase as great as 
eight years from the first to the third administration. The tendency 
to change from the first to the second and third administrations is 
more towards decrease than increase. The chance for a decrease of 
one year or more from the first to the second administration is almost 





— 


—_ i ai —_ ~—_ 


nd 
rst 
ter 


Reliability of the Goodenough Test 705 


twice as great as the chance for an increase of the same amount. 
There is a greater tendency to increase than to decrease from the 
seconc to the third administration. This tendency, however, is 
statistically unreliable. After a lapse of twenty-five days following 
the second administration, approximately sixty-five per cent of the 
subjects changed less than plus minus one year; nineteen per cent 
increased and seventeen per cent decreased one year or more. The 
agreement between the second and third scores is considerably higher 
than the one between any other pair of scores. This is perhaps due to 
a loss of interest in the task on the part of the subjects on the second 
administration and to a re-aroused interest in the same task on the 
third administration with a greater lapse of time in the interim. 

In a comparative study of the drawings of adjusted and malad- 
justed mentally deficient boys, the following twenty items were found 
reliably to differentiate between the two groups: (The item numbers 
refer to those in Goodenough’s original monograph) 4c, 5a, 5b, 6a, 7a, 
7d, 8a, 8b, 9a, 9d, 10a, 10c, 12c, 13, 14a, 14c, 14d, 14e, 14f. A credit 
of one point was allowed for each of the twenty items scored plus. 
The scoring was otherwise in strict accordance with the requirements 
stated by Goodenough. The Spearman-Product-Moment coefficient 
of correlation between the original and the abbreviated Goodenough 
scores was found to be .92 + .010.! 

To study the reliability and validity of the abbreviated Good- 
enough score, each drawing in the present investigation was re-scored 
by this method and a statistical analysis of the data made. 

_In Tables III and IV one notices the same tendencies as observed 
in Tables I and II. There is a trend to lowering of scores in each 
successive administration. The differences between the central 
tendencies of the three distributions, however, are statistically 
unreliable. 

As in Yepsen’s study, conclusive evidence of the agreement existing 
between the successive scores is indicated by the coefficients of correla- 
tion in Table V. Although the coefficients of reliability are lower 
than those found by Goodenough and Yepsen, they are sufficiently 
high to warrant the conclusion that the Goodenough test can be 
successfully applied with feeble-minded subjects from three to six 
weeks after the original administration with a high degree of reliability. 
There is a drop in the coefficients of reliability of the Abbreviated 





1 Brill, M.: Op. cit. 





The Journal of Educational Psychology 


TaBLE III.—DisTrisuTIon oF ABBREVIATED SCORES 














eomnrnroarh WN KE © 





Administration 
First Second Third 

2 2 1 
1 
3 2 1 
3 4 3 
6 5 8 
3 14 0 
5 8 7 
4 6 4 

11 5 8 
9 5 6 
6 7 5 
7 5 6 
3 5 
5 1 7 
8 6 4 
3 2 1 
2 3 3 
4 3 3 

85 83 77 
9 8 8 











Tape IV.—Amount oF CHANGE IN ABBREVIATED SCORES FROM First TO SECOND, 
First To THrrD, AND SECOND TO THIRD ADMINISTRATIONS 








i 
! 





Number points changed I to II I to III 
+10+ 1 1 
+ 8 Bi altc)s ertible.te 2 
+ 6 2 1 3 
+ 4 2 5 4 
+ 2 9 7 6 
0 32 28 38 
— 2 17 15 8 
4 4 7 3 
6 5 2 
8 
ea ee eee, 1 1 
Coda SERS ce ONEEE Peeeesh cites 73 67 65 
Per cent no change. 16.44 17.91 24. 
Per cent change less than +2...... 43.84 41.72 58. 














ce 
re 


rh wit he 





Reliability of the Goodenough Test 707 


scoring method, yet the correlations between the original and the 
Abbreviated scoring methods are sufficiently high to warrant the 
conclusion that the Abbreviated Goodenough score is a valid and 
reliable measure of intelligence, or whatever is measured by the original 
Goodenough score. 


TaBLE V.—CoRRELATIONS 








Correlations between r + PE, N 
First and second administrations, original scores........... .77 + .0382 | 73 
First and third administrations, original scores............ .68 + .044| 67 
Second and third administrations, original scores.......... .80 + .030| 65 
First and second administrations, abbreviated scores...... . .73 + .087 | 73 
First and third administrations, abbreviated scores........ .65 + .048 | 67 
Second and third administrations, abbreviated scores...... . .75 + .0387 | 65 
Original and abbreviated scores, first administration....... .95 + .007| 93 
Original and abbreviated scores, second administration. .... .98 + .003 | 90 
Original and abbreviated scores, third administration... ... .92 + .011] 85 
First and second administrations, test ages................ .77 + .0383 | 73 
First and third administrations, test ages................. .67 + .045 | 67 
Second and third administrations, test ages............... .81 + .030 | 65 











SUMMARY 


A study of the reliability of the Goodenough Draw a Man intelli- 
gence test, and of the validity and reliability of an abbreviated scoring 
method for the same test was made. The Goodenough test was given 
three times to the school children in a State institution for feeble- 
minded males. Eighteen days intervened between the first and 
second administrations, and twenty-five days between the second and 
third. Seventy-three boys were present in both the first and second 
administrations; sixty-seven in the first and third; and sixty-five in the 
second and third. The abbreviated scoring method consisted of 
twenty items that were found reliably to differentiate between socially 
adjusted and maladjusted mentally deficient boys. To study the 
reliability and validity of the abbreviated Goodenough score, each 
drawing in the present investigation was re-scored by this method and 
an analysis of the data made. 





































708 The Journal of Educational Psychology 


The following conclusions may be drawn from this study: 

The Goodenough Draw a Man intelligence test can be successfully 
applied as a group test with feeble-minded boys from three to six 
weeks after the original administration with a high degree of reliability. 

There is a tendency to a decrease in the Goodenough test-age 
after a lapse of three weeks. However, one may expect over fifty 
per cent of the cases to change less than plus or minus one year. The 
chance for a decrease of one year or more, within a period of three 
weeks, is almost twice as great as the chance for an increase of the 
same amount. 

After a lapse of six weeks following the first administration, one 
may expect over forty-six per cent of the cases to change less than 
plus or minus one year. The chance for a decrease of one year or more 
is only slightly higher than the chance for an increase of the same 
amount. 

After a lapse of four weeks following a second administration of 
the Goodenough test one may expect about sixty-five per cent of the 
cases to change less than plus or minus one year. The chance for an 
increase of one year or more is slightly greater than the chance for a 
decrease of the same amount. 

The agreement between the second and third scores is considerably 
higher than the agreement between any other pair of scores. 

The abbreviated Goodenough score appears to be a valid and 
reliable measure of intelligence, or whatever is measured by the 
original Goodenough score. 


lly 


ty. 
age 
fty 
‘he 
ree 
the 


ne 
an 
ore 
me 


he 
an 
ra 


nd 
he 


TWIN DIFFERENCES IN INTELLIGENCE 


D. CECIL RIFE 
Ohio State University 


The study of identical twins is undoubtedly one of the best avenues 
to the approach of the nature-nuture problem. Not only have we 
learned a great deal about physical traits, but with the aid of intelli- 
gence tests, we have also determined something of the range in IQ 
to be expected between individuals possessing the same hereditary 
make-up. The intra-pair differences obtained by various investigators 
agree quite closely for identical twins reared together, namely ; approxi- 
mately five points. While this is a rather insignificant difference, 
occasionally identical twins show intra-pair differences several points 
higher, enough to be significant, although small when compared to 
ordinary sib differences. Often such a difference is accompanied 
by a corresponding difference in physical make-up. One interesting 
feature of such twin pairs is that the physical differences are usually 
uniform throughout; for example, the heavier twin is usually the taller, 
and the larger in other measurements by about the same proportion. 
In sibs and fraternal twins, on the other hand, the differences are not 
uniform; for example, the heavier individual may be the shorter, or 
the one having the longer limbs may have the shorter trunk. In 
my! investigations of twenty pairs of identical twins those showing 
the greatest intra-pair physical differences, also, in the majority of 
cases, showed the greatest differences in IQ, (according to the Stanford 
revision of the Binet-Simon). For example, in the most dissimilar 
pair of twins investigated, the superior twin was a year ahead of her 
sister in school, was an inch taller, twelve pounds heavier and had 
an IQ fourteen points higher than her sister. Yet qualitatively they 
were extremely similar. The impression gathered from working with 
this pair was that the inferior twin was an almost exact duplicate of 
her sister, except that her development had been retarded. Such 
quantitative differences do not seem inconsistent, in view of the fact 
that identical twins may not always receive equal nourishment during 
embryonic development. Newman? states: 


Separate one-egg twins often develop unequally. One probable cause is 
that in the zone of competition, during embryonic development, the separate 
placental circulations of the twins come very closely into contact and more or 
less anastomoses of veins, capillaries and arteries takes place between the two 

709 








710 The Journal of Educational Psychology 


circulations. Although small in volume, much of the welfare of the twins 
depends on whether or not the region of intercommunication is symmetrical. 


The most striking physical similarities in identical twins, are, for 
the most part qualitative rather than quantitative, and of a rather 
intangible nature, as, for example; gait, voice, mannerisms and fea- 
tures. We found the same to be true of responses to the Binet-Simon 
tests. Although IQ tells something in regard to the amount of 
intelligence possessed by an individual, two individuals may make the 
same scores by passing different parts of the same test, or by answering 
the same parts, but with different responses. In my investigations 
each individual was tested alone, and no opportunity was given the 
members of a pair to get together between tests. I was impressed 
with the similarity in responses in members of a pair. For example, 
in a certain pair, each twin, when asked the meaning of lotus, 
responded: ‘‘Lotus is an animal, no, I was thinking of locust.’”’ One 
twin went on to state that lotus is a kind of flower, while her mate did 
not. When given the problem, “ At fifteen cents a yard, how much 
will seven feet of cloth cost?’ both members of another identical 
twin pair incorrectly responded, ‘‘Thirty-seven and one-half cents.” 
When asked to go back over the problem and show how they had 
arrived at their answer, their errors in calculation were found to be 
identical. Such similarities are continually encountered when testing 
identical twins. 

The question naturally arises as to whether or not such similarities 
are due to their being reared in similar environments, or to their 
having the same genetic make-up. The obvious approach to the 
problem is to compare such test results with the results of similar tests 
given to fraternal twins. Accordingly, the test scores of sixteen 
pairs of identical twins (reared together) were compared with the 
tests of five pairs of fraternal twins (reared together). Originally 
more pairs of fraternal twins had been tested, but because of their 
great intra-pair differences in total scores, several fraternal pairs were 
not used. 

The following method was used in making the comparison: Starting 
at the age-level in which one or both members of the pair made one 
or more failures, the number of correct responses made by the indi- 
vidual having the lower IQ was obtained. The next step was to 
determine what percentage of these responses was also answered 
correctly by the other member of the pair. The case below illustrates 
the method used. Twin A passed five tests correctly, while B passed 


—— Te = ws SS OO — 


VS we ee we ee E SF .EelULTeSelUCtC™ 


on S Oo wr |S YB SS 


Twin Differences in Intelligence 711 


six. Of the five tests passed by A, four were passed by B, or eighty 
percent. It can readily be seen that such a method is not complicated 














Twin A Twin B 
Year XII.* 
LG liar iid ce bce eS coke Res ae weeds + + 
ESS Cae eEee >, ae ee ay OE - + 
tL «cogs ain Leesue det adeedcn sk + - 
tina uans ¢0slcakneehs oe/eknadsews snes + + 
thie inna hed.ob os Mabhenedslsakossie = = 
DT Tics ak bank bowed ae dae eed eas _ + 
ITS i556 64d 460Gbe ea eewenssbnsabe + + 
le aii: sil dius sés dig ahd aaa wel as aie we i + + 
Year XIV.f 





* Both passed all tests up to year XII. 
t Both failed all tests here and above 


by differences in total IQ, except where such differences are quite large, 
and no such cases were included. The mean percentage of agreement 
obtained for the identical twins was .956 + .013, and for fraternal 
twins, .760 + .039. The difference of the means is .196 + .041, or, 
in other words, there is only one chance in eight hundred thirty-three 
that the difference is not real. 

The Stanford revision of the Binet-Simon was the only intelligence 
test employed. The statistical analysis simply tells us that identical 
twins reared together are more similar qualitatively in mental traits, 
as measured by the Binet scale, than are fraternal twins reared together. 
Although these data suggest that the kind as well as the total amount 
of such intelligence is innate, it should be remembered that the environ- 
ment of identical twins is more similar than that of fraternal twins. 
If the greater correspondences in the environments of identical twins 
are due to innates similarities in likes and dislikes, then the qualitative 
similarities in responses to the Binet tests have a genetic basis. If 
they are due to parents and other associates treating them more alike 
than they would fraternal twins, such differences are a matter of 
training. The ultimate solution of the problem should be obtained 
by a comparison of identical twins reared apart with fraternal twins 
reared apart. It would also be desirable to use other types of intelli- 
gence tests. The value of studying identical twins reared apart is 
generally recognized,* whereas the need for comparing fraternal twins 


~~ 


oad oll i 3 4 e4 ‘ / . “te : 
* oO elle ele date ae alta eile Sn arnt meted Sigtetietien - 


4) 
: 
a 
é 
> 
$ 








712 The Journal of Educational Psychology 


reared apart is frequently overlooked. It is no more logical to use 
fraternal twins reared together as controls for identical twins reared 
apart, than it would be to use fraternal twins reared apart as controls 
for identical twins reared together. Finally, it should be remembered 
that no two individuals have either absolutely an same or totally 
different environments. 


REFERENCES 


1. Rife, D. C.: Jour. Hered. Vol. XXIV; 1933, pp. 339, 407, 443. 
2. Newman, H. H.: The Physiology of Twinning. University Chicago Press, 1923. 
3. Newman, H. H.: Jour. Hered., Vol. XXIII, XXV. (A series of articles.) 












































3. 


AMERICAN PSYCHOLOGICAL PERIODICALS 


American Journal of Psychology—Ithaca, N. Y.; Cornell University. 
bscription $6.50. 624 annually. Edited by M. F. W Madiso 
Sur'M. Dallenbach, and E.G. Boring, " mee : 
Quarterly. General and experimental psychology. Founded 1887. 
Journal of Genetic Psychology—Worcester, Mass.; Clark University Press. 
Subscription $14.00 per yr.; $7.00 per vol. 1,000 pages ann. (2 vols.). Edited by Carl 


Murchison. ett 
Quarterly. Child behavior, animal behavior, comparative psychology. Founded 1891. 
Psychological Review—Princeton, N. J.; Psychological Review Com ' 
rete $5.50. 540 pages annually. Edited by Herbert S’ Langfeld. 
Bi-monthly. General psychology. Founded 1894. 
Psychological Monographs—Princeton, N. J.; Psychological Review Com , 
Me echeeription $6.00 per vol. 500 pages. ‘aot by Joseph — 
Without fixed dates, each issue one or more researches. Founded 1895. 
Psychological Index—Princeton, N. J.; Psychological Review Company. 
Subscription $4.00. 400-500 pages. Edited by Walter S. Hunter and R. R. Willoughby. 
An annual bibliography of psychological literature. Founded 1895. 


Psychological Bulletin—Princeton, N. J.; Psychological Review Company. 
Subscription $6.00. 720 pages annually. Edited by Edward 8S. Robinson. 
Monthly (10 numbers). Fapehtlagieal literature. Founded 1904. 


Archives of Psycholo New York, N. Y.; Columbia University. 
Subscription $6.00. 500 pages per volume. Edited by R. B Woodworth. 
Without fixed dates, each number a single experimental study. Founded 1906. 


Journal of mg and Social Psychology—Eno Hall, Princeton, N. J.; American Psychological 


n. 

Subscription $5.00. 448 pages annually. Edited by H T. Moore. 

Q y. Abnormal and social. Founded 1906. ouia 

ological Clinic—Philadelphia, Pa.; Psychological Clinic Press. 

Subscription $3.00. 288 pages. ited by Lightner Witmer. 

Without fixed dates (Quarterly). Orthogenics, psychology, hygiene. Founded 1907. 
Journal of Educational chology—Baltimore: Warwick & York. 

Subscription $6.00. 720 . Monthly — June to A 

Edited by J. W. Dunlap, P. M. Symonds and H. E. Jones. Founded 1910. 


Psychoanalytic Review—Washington, D. C.; 3617 10th St., N. W. 

Subscription $6.00. 500 pages annually. Edited by W. A. White and S. E. Jelliffe. 
Questa - Psychoanalysis, Founded 1913. 

Journal of Experimental Psychology—Princeton, N. J.; chological Review Company. 
Subscription $7.00. 900 pages annually. Edited ae W. Fernberger. shia 
Bi-monthly. Experimental psychology. Founded 1916. 

Journal of Applied Psychology—Indianapolis; C. E. Pauley & Co. 

Subscription $5.00. 600 pages annually. Edited by James P. Porter, Ohio University, 
Athens, Ohio. Bi-monthly. Founded 1917. 

Journal of Comparative Psychology—Baltimore, Md.; Williams & Wilkins Company. 

Subscription $5.00 per see mag ye ae Ed. by Knight Dunlap and Robert M. Yerkes. 
Two volumes a year. Founded 1921. 


tive chol Mon s—Baltimore, Md.; The Johns Hopkins Press. 
Psy logy ograph meh? Dvds pk 


t p -—~~- Editor. 
Published without fixed dates, each number a single research. Founded 1922. 
stic Psychology Mon Worcester, Mass.; Clark University Press. _ 
; Subscription $7.00 per vol. One volume per year. Edited by Carl Murchison. ; 
Bi-monthly. Each number one complete research. Child behavior, animal behavior, and 
comparative psychology. Founded 1925. 
Psychological Abstracts—Eno Hall, Princeton, N. J.; American Psychological Association. 
ae gga $6.00. 700 pages ann. Edited by Walter S. Hunter and R. R. Willoughby. 
Monthly. Abstracts of psychological literature. Founded 1927. 
ournal of General ology—Worchester, Mass.; Clark University Press. 
or tion $14.00 per yr.; $7.00 per vol. 1,000 pages ann. (2 vols.). Edited by Carl 


Quarterly. Experimental, theoretical, clinical, historical psychology. Founded 1927. 
ournal of Social Psychology—Worcester, Mass.; Clark University Press. 
Subscription $7.00. 500 annually. Ed. by John Dewy and Carl Murchison. 
- Political, , and differential psychology. Founded 1929. 


WE $5 400 pages per volume. 











The Psychology of the 





re blew 
Wevuam- 





















































“ Common Branches , 

q S74 a. 

: By Wuuau H. Prix \ \ (Vet SUE" : 

7 “Doctor Pyle has produced very ee 

i useful, teachable, and productive con- | aifys 

a tribution to the psychology of teaching. 

' In this book the principles essential to ON 
j the effective teaching of reading. spell- a 
‘ ing, handwriting, and arithmetic are en, 
i i 

7 


set forth ... The book is not only 
one that can be used as a textbook 
but it is an exceptionally good refer- 
ence work.”—Journal of Educational 
Research. 

“The book is written in clear, non- 
technical style and should be very 


: WEBSTER’S 
| NEW INTERNATIONAL 


in Chief, the most author- 

















valuable for the student in training or ~ | nem may 600, Entries a 
for the teacher who wishes to make a SS iat 000 Entries noe found’ 4 ie 
involved in teaching the four skill Thoutsnds of encnclopelie articles wont. a 
subjects.” —Bulletin, New York Society ela) —— oe 
for Experimental Study of Education. Reteuattan ene cae ns ene ae 
e po ange tions a tely a yet ie 
a $2.00 plus 8¢ postage gies never before published. Pron 9 
iM WARWICK AND YORK Write for free illustrated pam- Ro 
i Publishers Baltimore, Md. oie er edonetoaen’ “_ 
G'S C. MERRIAM CO. - 
he IES MNT SRE a Springfield, Mass. ee 
4 A DETERMINATION : 


‘OF GENERALIZATIONS BASIC TO THE a 


i SOCIAL STUDIES CURRICULUM a 
: q By Neat Bi.uines a 


Pr. conventional courses in history, geography, and civics fail to achieve - 
as adequately as possible the great aims of the social studies. Granting that : 
i existing courses acquaint pupils with many of the main facts of life, still not all of the B 
> important facts are presented nor are the vital inter-relationships indicated. _ 
ee. In this study Professor Billings points the way toward a more effective curtic- i 

ulum by setting forth the basic generalizations needed by those who are making B 
ni curricula in the social studies in order that prospective American citizens may be 
a taught to think intelligently about the conditions and problems of contemporary life. 


$3.00; postpaid $3.12 


WARWICK AND YORK, INC. 


Publishers Baxtimore, Mp. 


—wee 





THE MAPLE PRESS COMPANY, YORK, PA- 


: 
: 
; 
: 
: 














amit - Ng ES 7 ~ omen , 
. 





Tex 


es 























